Byzantine Reality

Searching for Byzantine failures in the world around us


Over the last few months we’ve been working away on something that we think is pretty cool, and just two weeks ago we finally released the first version of it (which was naturally followed up by another release to fix the bugs in the first). It’s something we call AppScale, a platform on which you can run Google App Engine apps. But how does it differ from the platform that Google gives you to run App Engine apps on your local computer or the platform they host it in? Let’s explore that together!


Before AppScale, there were two common deployment options for App Engine apps:

  1. Your local machine, primarily used for testing. Data is saved into a flat-file database and there’s no real support for authentication. For testing though, it’s perfectly fine, since that’s all you’re using it for.
  2. Google’s infrastructure. It’s a giant black box, but in all likelihood it’s storing data into BigTable. Authentication is easy since it uses Google Accounts for authentication, but now you’re at the mercy of Google.

The downsides of both these approaches (mostly the second though) led to the development of AppScale’s predecessor, AppDrop. AppDrop was created by J. Chris Anderson and is essentially a Ruby on Rails app that allows you to upload Google App Engine apps into it and have it run them. This was an excellent starting point, but we wanted to take it a step further. I looked through AppDrop, and with Chris’ help, I was able to get a good grasp on it. It’s not particularly complicated, but it was my first Rails app and I just didn’t know how Rails did things.

Before we can go a step further, we need to introduce the other critical piece of the puzzle: Eucalyptus. It’s an open-source Infrastructure-as-a-Service (IaaS) that is the lowest level of abstraction as far as cloud computing goes. It allows users to ask it for a certain number of virtual machines and spawns them up, returning the IPs of the boxes spawned up. If you’re familiar with Amazon’s EC2, then you pretty much already know Eucalyptus, since they’re API compatible (the EC2 tools work if pointed at Eucalyptus instead of EC2).

Simply put, combining AppDrop and Eucalyptus yields AppScale. AppScale is an open-source Platform-as-a-Service (PaaS) that sits on top of Eucalyptus and runs AppEngine apps. Instead of saving data into a flat-file database like Google offers you for testing, AppScale offers the option of using either HBase orHypertable. Both are open-source implementations of Google’s BigTable and rest on the Hadoop Distributed File System, an open-source implementation of the Google File System.

We have a technical report (pdf) detailing the specifics of the paper, so I’ll just gloss over it here. The base configuration of AppScale is pretty simple, containing four virtual machines. One node acts as a load balancer (the paper refers to it as the AppLoadBalancer) and the other three nodes host your AppEngine apps (similarly dubbed AppServers). Communication between the nodes is done over SOAP by a daemon that sits on each node, the AppController.

The big thing we’ve held off on at this point is the last half of our project’s name, the scaling part. We’ve seen quite a few metrics on when other products decide when is a good time to create more virtual machines and when to destroy them, but it’s not clear which metrics make the best indicators. So for now we’ve taken the simpler way out and only spawn up a static configuration. Over time we’ll investigate which metrics are best and report on them, and of course, since it’s open-source, anyone else can too! You can verify our numbers, make your own way of spawning more nodes, and so on. For now though, we keep it pretty simple and only measure CPU usage and memory usage, but you can rest assured that will grow to other metrics soon.

Of course we have many other ideas as well going forward. We have instrumented versions of Ruby and Python running in AppScale that can dump call trees and calling context trees to tell you which methods in your program called others (as well as how often) and allow you to find where time is being spent in your program. Finding a way to report this back to the user in a simple, aggregated way is something we’re looking at. But since I don’t want to give too much away, I won’t spoil the surprise of other things we’re working on.

So if you happen to have a few machines lying around, give it a try! If you already have Eucalyptus installed, you should be able to get AppScale up and going in no time at all. Let us know what you think about it and what you need from it to make it a viable platform! Either way, be sure to check back next time for our thanks to the various pieces of software that make up AppScale and how they got it working for us.