Byzantine Reality

Searching for Byzantine failures in the world around us

Thoughts on Chubby

The last few times around we’ve been talking about performance. Extreme performance. We’ve talked about how fast the Google File System is, how fast MapReduce is, and how fast BigTable is. Yet all three rely on a common service we hinted upon last time, a lock service (amongst other things) named Chubby, the topic of today’s discussion. Chubby doesn’t bill itself as high performance like the other services we’ve mentioned, but is designed with high availability in mind. Let’s dive in further and see what exactly makes up Chubby and why it’s so special in the Google world.

Chubby provides a number of services, but at its heart, it is a lock service. It normally consists of five replicas (as opposed to the usual three from the Google File System) and is considered to be up if at least three are functional (a majority). This is essential because leader election is done via the Paxos algorithm, which requires a majority of the nodes to be up for progress to be made, but as it is the topic of our next discussion, we can brush it under the rug for now.

A main point in the paper is that although Chubby is a lock service, it is also used heavily as a naming service. It names locks in a file-system-like manner, and thus a server can place it’s location in a file named “/ls/myserver” (ls for lock-server). Since the server holds the lock on that file and renews the lock as long as it is up and providing a service, it is assured that the data is never stale. Clients can then find this server by querying Chubby for everything in the “/ls/” directory and voila! The naming problem is pretty much solved. The paper refers to these files as being “ephemeral files”, namely temporary files used to indicate that clients/servers are alive and which locks are valid. They also hint that it can be used to indicate whether other resources are valid asides from locks, such as caches, but that it is currently an unexplored area.

What is explored thoroughly is what Chubby does for other Google services. It is only stated that it is a “rendezvous mechanism” for MapReduce, which I would expand upon and speculate refers to naming which machines are running Map jobs and Reduce jobs so that the scheduler can map the output from one to the input to the other in a coherent manner. Both the Google File System and BigTable need to elect master servers (primaries), and Chubby is used in what appears to be the exact fashion we described in the last paragraph (note this also provides naming for where the primaries are). Finally, they also use it as a repository for files requiring high availability. They cite is as being used for access control lists, and it is a fair presumption to say that it is likely used on ACLs for metadata on files of particular importance (e.g., are about the filesystem itself or about particularly important data).

An interesting design choice that was made early on was to provide Chubby as a lock service over providing it as a library that developers could use in their programs. Both ways have their pros and cons, but the big downside of using it as a library is that the programmer would have to form their program into a state machine (required for Paxos), which requires a non-trivial amont of forethought and much greater re-designwork after the fact. They go on to state that they preferred the lock service idea since it’s relatively familiar to programmers, despite their knowledge that programmers don’t know how to use locks correctly, and that it’s even more difficult a problem in the case of distributed systems.

Like BigTable, metadata stored in Chubby contains a version number per file, but unlike BigTable, old versions cannot be recovered. It appears this version number is primarily used to determine whether or not a user has the most recent version of a file before they update it (e.g., they grab version 4 and want to write changes as version 5 but in the meanwhile someone else comes in and writes two sets of changes, pushing the current version number to 6).

Furthermore, while Chubby is a lock service, it only provides advisory locks (not mandatory). Nothing prevents a client from writing or reading a particular file they do not have the lock for, provided they have the correct ACLs to do so. The motivation behind this is more of convenience: it is perceived as a failure to force a user to reboot their machine to force it to release a lock and that having them release their lock is out of date is far preferable. Which brings us to a not-so-minor point we lightly touched on earlier: locks must be renewed constantly, as the leases on them tend to be small (although they can be variable length as set by Chubby). We shall later see that the traffic incurred by these renewals is the greatest source of traffic in the system by an order of magnitude.

In an attempt to make using Chubby more user-friendly, calls to it can be either synchronous or asynchronous. The programmer can specify a callback function as well to be called upon certain events happening. The primary events cited in the paper that callbacks are used is when a master fails over, when locks become invalid, or file contents are modified by a different user (all cases we’ve talked about earlier). Two methods are also given to close a user’s session: a Close method, which destroys the object the user is manipulating (this is all C++ so we have to do garbage collection ourselves), and a Poison method, which does not free the object but renders it unusable (all method calls on it fail) so that other threads can free it.

That being said, let’s look at exactly how Chubby performs. These numbers are lifted straight from the paper and claim to be typical of Chubby instances at Google (note that RPC rates are over a period of ten minutes):

chubby

This is a good summary of everything we’ve talked about: extreme availability (one 14 second downtime every 18 days) in the face of a huge number of clients (54000). And like we mentioned earlier, the lock renewal messages (KeepAlive RPCs) make up the vast majority of RPCs (93% versus 2% for getting a file’s metadata, the next closest contendor). Since there are so many messages going to the server, a proxy can be used (see the proxied clients row) to reduce the load going to the Chubby server, and its importance is not understated in the paper:

We have found that the key to scaling Chubby is not server performance; reducing communication to the server can have far greater impact.

So now you know a decent amount about Chubby and how it interoperates with the other Google services. If I’ve piqued your interest at all you should certainly read the paper itself, as it goes more into detail about everything we’ve talked about and more (design decisions, human factors, and so on). It also talks a little bit more about the various Chubby instances, such as how there typically is one five-replica instance for each data center and one five-replica instance spread out across the world (the “global cell”), ensuring high availability and relatively quick access speeds no matter where you are on the planet.

Something to end on today is to note the recurring theme of leader election in distributed systems. It came up originally in the GFS paper, showed up again in BigTable, and once more here in Chubby. If you aren’t familiar with how this is done, then stay tuned for next time, when we shall look at how Google solved this problem using a well known solution (the Paxos algorithm) and the problems they ran into.