I consider blogs to be "work in progress", but this entry seems to be
even more so – and since it’s also describing work in progress, somehow recursive 🙂
One of the pieces still missing from (Open)Solaris is the capability to
forward IP incoming packets to a set of (more than one) hosts from within the
kernel, ie. to do load balancing.
The main benefit of an in-kernel load balancer vs. a userland-based one
is the much reduced traffic of networking data ("payload") through the
kernel/userland boundary. Traffic across this boundary is known to be expensive,
therefore the fact that we incur less of it means that – all other
things being equal – we can achieve better performance, both wrt
connections per second and wrt throughput.
To address this, we recently created a prototype with very basic load
balancing capabilities that we’re hoping to put out on opensolaris.org
once all the formalities (read: legal stuff) have been completed. You
may have seen Sangeeta’s email proposing this project for opensolaris: http://www.opensolaris.org/jive/thread.jspa?threadID=64639&tstart=0.
We’re also going to be soliciting input from people who would like to
actively test this prototype.
We realise that a full product offering around a load balancer is
unlikely to be achievable within the time it would make sense for us to
do so, from the point of view of the addressable market, so we’re going
to concentrate on providing the infrastructure necessary for developers
and OEMs to optimally exploit this capability we’re introducing. (Plans
on *when* this is going to happen, and what exactly is going to be in
which delivery aren’t quite finalised, so please bear with us …)
Even before we release the code, I think I can present a short overview
of what the prototype consists of. We have:
– the in-kernel forwarding engine ("ilb" = internal load balancer, which
we also use as name for the whole project …)
– the command-line utility ("lbadm").
Things like redundancy (ie. failover), backend server healthcheck etc.
were not implemented for the POC.
My task was and is to define the requirements for, and then design and
implement the CLI. While this sounds rather straightforward, the devil’s
in the detail, as usual.
Here’s some of the questions being asked of CLI as well as the
CLI/kernel module combo, as well as their answers:
- what does the CLI do? (that’s the obvious one 😉
A: Administrate all ILB rules and display associated information.
- what is the "unit of currency" the ilb handles?
A: (as indicated above) a rule. A rule consists of:
a. a set of conditions to be met by the incoming packet
b. the destination for a packet that matches the above conditions
c. additional information for the load balancer.
- is there precedent in Solaris for similiar functionality (ie, do we want to look at dladm or perhaps zfs)?
A: the model we chose to follow is flowadm (coming with the crossbow project, not yet in Solaris) (see
basic structure is
command subcommand [options] [object]
and a subcommand always is of the form "verb-object" eg "show-flow" or,
in the case of lbadm, "create-rule". The object in our case is the rule.
- how do we structure the CLI?
A: for the prototype, the CLI was one monolithic, stand-alone binary.
- how does the CLI talk to the kernel?
A: for communication between CLI and kernel, we created a data structure to contain all the relevant information and defined an ioctl for passing information to and fro.
- what about statistics?
A: currently, the kernel maintains a basic set of kernel statistics (kstats); some of them for the whole module, some on a per-rule basis and some on a per-backend server basis. For the prototype, I created a shell script to read the data via kstat(1) and perform some mangling on them to produce vmstat(1)-like output.
some of the additions/modifications which will be implemented by this
- the CLI functionality will be split into a library and a CLI consuming
the library. The purpose of this is to enable 3rd parties to make use of
- integration of statistics display into lbadm.
- addition of failover functionality using VRRP.
- add configuration persistence and integrate with SMF.
- integration with ipnat configuration1.
- implement some form of check for the "health" of backend servers
- enable management of several hosts as single entities (host pools)
- connection "stickiness"
1) so far, I’ve not explained one major aspect: load balancing
methods and topology. Topologies known in the industry are DSR (direct
server return – the load balancer never sees return traffic, or just
forwards it back without any modification) and NAT (half vs. full);
known methods are round-robin or various forms of connection weighing.
ipfilter, which has been in Solaris for quite some time and has been
available as an opensource project for much longer, has some NAT
functionality. For the prototype, we implemented DSR functionality
seperate from ipfilter’s nat functionality, and in no way integrated the administration of ipnat with lbadm