And how are you feeling today?

One of the requirements that need to be fulfilled by an offering in the load balancer space is he ability to periodically check the health of its’ back end servers.

The health of a back end server can be defined in several ways:

  1. the ability to respond to ping
  2. the ability to perform a tcp handshake
  3. the ability for a server application (ie. http server) to respond with meaningful data to a request
  4. (your favourite method here …)

in addition to the check, there must be an ability to report either the health status of a given server, or to report the change of status for a given server when this status changes (ie, when a server dies or comes back to life).

We examined a few open-source network monitoring tools (I think nagios was among those tools, as well as OpenNMS … I wasn’t too deeply involved in this part, so I don’t know the details), but came to the conclusion that none was suited well enough for our purposes, so we decided we’d need to build our own. We still need to finalise the design, but I think I can give a basic outline of what will be required for a health check subsystem within the ILB project, as well as some of the requirements on other parts of the ilb project to accomodate HC:

  • HC will (initially) be private to ilb. 
  • we plan to implement this as a daemon, ie. hcd (health check daemon).
  • lbadm, the tool to administer ilb, will also be the only means to administer hcd.
  • hcd will not maintain any persistent state.
  • for this release, all back end servers for a lb rule will be checked by the same health check.
  • as a consequence of the above, since a server can be part of more than one rule, it must be possible to perform several checks on the same server.
  • ilb will be able to distinguish between permanent removal of a back end server (eg. by an administrator) and temporary removal of a back end server (eg. when it is unreachable over the network) from a rule.
  • hcd will implement some kind of capability to log the fact that a server has died (eg. using syslog).

I drew a crude picture of what I believe represents how hcd fits into the rest of the ilb infrastructure (so far) – I didn’t spend much time on it, nor am I the born artist with electronic paint tools, so I’ll ask you to excuse the craftsmanship and concentrate on the content šŸ˜‰

This entry was posted in Sun. Bookmark the permalink.

1 Response to And how are you feeling today?

  1. Feeling quite well, thank you for asking šŸ˜‰
    Not quite sure how your evaluation of OpenNMS was performed, but OpenNMS performs health checks such as these you described millions of times a day and is only a single component of the system (event, notification, and performance management. A complete HTTP transaction can be tested by:
    – connecting to a URL (virtual domains supported)
    – get a cookie and follow a redirect
    – login and verify with success criteria
    – fill out a form and submit
    – check results
    – log out
    The response time of this transaction can then be graphed and used by the thresholding process to detect slow responses and the availability of the service is tracked and can be shown in availability reports.
    I responded to this post because I spend 30% of my consulting time with OpenNMS helping enterprises and service providers with exactly this same requirement. Often, we migrate custom monitoring applications that have been written by the organization into existing OpenNMS monitors or we build an enhancement on site. I think it is much easier to add functionality to a monitoring platform than to write a new platform when you really only need a new monitor.
    Just FYI and good luck with your development.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s