Byzantine Reality

Searching for Byzantine failures in the world around us

Cloud Computing

So it seems all the buzz in the distributed computing world is all about the magical realm of “Cloud Computing”. Everyone seems to have a different definition of what Cloud Computing is though, so for the purposes of this post, we’ll define it as follows:

Cloud Computing: Doing important work using resources that do not physically belong to you.

This definition is just vague enough to encapsulate the various “areas” of cloud computing that people talk about and just precise enough to note that the resources you use here tend to be virtualized in data centers (or even in virtual data centers).

Now that we’ve got the necessities out of the way, let’s look at a few providers of Cloud Computing services: Amazon3teraRightScale, and Google.

Amazon provides the basic underlying infrastructure that can be used on its own or with help from the guys and gals at 3tera and RightScale. Their EC2 (Elastic Compute Cloud) and S3 (Simple Storage Service) allow a developer to acquisition an arbitrary number of virtualized hosts to work together. If one host goes down, it’s not a problem since another virtualized instance can be fired up to take its place. They’ve also got “Availability Zones” that essentially turn your app into a super-reliable app across multiple virtual data centers. But as this technology has seen the most press out of everything we’ll be talking about, I don’t think this warrants more attention.

3tera provides an application that can run in your web browser called AppLogic that provides a beautiful front-end to services such as Amazon EC2. It resembles a flowchart style program (think Microsoft Visio, Dia, or OmniGraffle) that developers use to “hook up” various applications with arrows to represent some sort of communication. The demo shows a LAMP-like setup wherein the developers set up a load balancer, some web servers, SANs, and a MySQL DB. The main reason that this looks like it works so seamlessly is because the domain appears to be so limited (in the grand scheme of programming). AppLogic seems to be geared at web applications, and if that’s what you’re doing, AppLogic looks amazing. They claim in the demo that developers need about three weeks to get a good handle on the software, which isn’t bad at all for writing highly reliable, scalable web apps.

RightScale performs a similar service, in that they are also primarily focused on web apps. Both also provide dashboard-widget devices to show up-to-date statistics on CPU usage, DB usage, anything you like (surprisingly fine grained). RightScale also has a service / program called RightGrid, which diverges off the web apps track. They use Amazon EC2 to run serial computations across several computers, which is better explained with an example. If I write some code that drains all of the CPU on a computer to compress some video file and I need to do this to 100 videos, RightGrid gives the opportunity to dynamically create new virtualized hosts and send them the work to do. The important distinction is that they don’t seem to be automatically parallelizing work (that is, there’s no gain if I have 100 computers and 1 video to rip) but a huge gain in the more-or-less serial case (100 computers and 100 videos).

The final player we’ll look at in our of-course-not-complete list of Cloud Computing players is Google’s App Engine. They’re a little bit different than the other guys so they deserve some special mention. For starters, they’re free but much more restricted. You’re limited to 5 million pageviews per month (still a huge number by any account) and the code you write has to be in Python. Your code can’t write to the file system, although you do get persistent storage through the Datastore DB (API provided), which uses a SQL-like syntax called (can you guess it?) GQL! They also provide APIs for image manipulation, high performance caching, and some limited Google Accounts stuff. Finally, you can’t write any multi-threaded code or use any libraries that are implemented as C or Java code. So there’s no explicit “web-apps only” feel here, but since they push Django a bit, it seems like this is web apps territory again.

Not that I have a problem with web apps, of course. Web applications are pretty cool to show off to your friends and family and without them I wouldn’t have a place to blog about on the ‘Net :). These are just a few of the Cloud Computing Service Providers and so far, it all looks pretty interesting and promising. As I may be researching this field over the summer, check back for updates and the like.