One of the requirements that need to be fulfilled by an offering in the load balancer space is he ability to periodically check the health of its’ back end servers.
The health of a back end server can be defined in several ways:
- the ability to respond to ping
- the ability to perform a tcp handshake
- the ability for a server application (ie. http server) to respond with meaningful data to a request
- (your favourite method here …)
in addition to the check, there must be an ability to report either the health status of a given server, or to report the change of status for a given server when this status changes (ie, when a server dies or comes back to life).
We examined a few open-source network monitoring tools (I think nagios was among those tools, as well as OpenNMS … I wasn’t too deeply involved in this part, so I don’t know the details), but came to the conclusion that none was suited well enough for our purposes, so we decided we’d need to build our own. We still need to finalise the design, but I think I can give a basic outline of what will be required for a health check subsystem within the ILB project, as well as some of the requirements on other parts of the ilb project to accomodate HC:
- HC will (initially) be private to ilb.
- we plan to implement this as a daemon, ie. hcd (health check daemon).
- lbadm, the tool to administer ilb, will also be the only means to administer hcd.
- hcd will not maintain any persistent state.
- for this release, all back end servers for a lb rule will be checked by the same health check.
- as a consequence of the above, since a server can be part of more than one rule, it must be possible to perform several checks on the same server.
- ilb will be able to distinguish between permanent removal of a back end server (eg. by an administrator) and temporary removal of a back end server (eg. when it is unreachable over the network) from a rule.
- hcd will implement some kind of capability to log the fact that a server has died (eg. using syslog).
I drew a crude picture of what I believe represents how hcd fits into the rest of the ilb infrastructure (so far) – I didn’t spend much time on it, nor am I the born artist with electronic paint tools, so I’ll ask you to excuse the craftsmanship and concentrate on the content 😉