Byzantine Reality

Avatar

Searching for Byzantine failures in the world around us

Articles tagged with 'cloud computing'

Active Cloud DB Presentation

At this year’s CloudComp I presented on Active Cloud DB, our group’s new software that exposes a RESTful API to anything that runs Google App Engine apps. Here’s our paper on it, and here are the slides I presented. Enjoy!

CloudComp 2010 Field Report

This week I am at CloudComp 2010 in Barcelona, Spain to present our group’s recent paper on Active Cloud DB. So I figured that while I was here, I had might as well report on the keynote talks given as well as the best of the papers presented there. Enjoy!

Day One

Keynote Talk One:

Ignacio Llorente, project lead over at OpenNebula, gave the first talk of the conference, discussing the various features that the OpenNebula cloud infrastructure offers its users. One of the more interesting features he discussed involved hybrid cloud support – specifically, OpenNebula can act as a broker between clouds, providing a single, unified interface between any number of private clouds you may happen to have and public clouds. Unfortunately, he didn’t get into many technical details and really just listed all of OpenNebula’s features, which I guess is ok since it was a keynote talk, but for an hour long keynote I was expecting any kind of depth and not just breadth.

Top Paper from Day One:

Alexander Reinefeld gave an interesting talk named Data Management in Clouds, which (1) is a really generic paper title and inevitably (2) IMHO was quite a bit of a misnomer, since it was really aboutXtreemFS, a file system targeted at the grid computing world. He said we needed a cloud file system, but didn’t really make the case that this was “cloudy” in any way – there didn’t seem to be any discussions about elasticity or anything else we seem to hear about in the cloud world. Otherwise the talk seemed pretty solid – it doesn’t really seem to be a research project as much as being a production-level-ish piece of software.

Keynote Talk Two:

Alvaro Arenas gave the second day’s keynote talk, discussing the XtreemOS project. He went for the opposite approach of the previous keynote speaker – lots of depth on one part of XtreemOS (the security layer) and very little on the rest of the system. Unfortunately, it didn’t seem to be really talking about cloud computing at all – the project is firmly rooted in the grid computing world and doesn’t really talk about challenges or benefits in the cloud world. Specifically, I was looking for him to say anything about how an operating system works when the number of nodes in the system changes (a common cloud use case) or anything about SLAs (also extremely common to the cloud) but didn’t hear anything about either.

Top Paper from Day Two:

(Asides from my paper, which was also on day two, of course!)

Guillaume Pierre talked about CloudTPS, their way to get ACID-transactions in NoSQL databases. It basically puts a transaction manager in the system which handles this on behalf of the database. Of course, this component is distributed (many nodes) and implements the well-known two-phase commit protocol. The presentation was very interesting but there were a few things I would have liked to seen that weren’t there:

  • How the evaluation is performed was a bit vague – he was a bit low on time at this point and would have liked any explanation of the graphs more than “yay it scaled!”
  • We sharply disagreed on what “consistency” was – they claim that Amazon SimpleDB was eventually consistent even with consistent reads and that NoSQL datastores favor availability over consistency – which just isn’t so (see HBase, Cassandra with consistent reads/writes, MemcacheDB, and so on). After a bit of discussion I believe he was trying to say that consistency to him meant ACID – but clearly it is not the case that these datastores favor availability over consistency just because they don’t have ACID transactions.

Regardless the talk was very interesting and the general idea was sound.

Keynote Talk Three:

Kate Keahey, project lead at the Nimbus Project, gave the last keynote of the conference, about the pros and cons of using cloud computing for running scientific experiments. She gave a breakdown of how Nimbus is implemented as well as the tools out there that sit on top of it.

I found her talk to be the most straightforward and the best talk of the conference (again, outside of mine of course). I liked her approach of talking in some depth about many topics to be far preferable to the other keynotes’ styles of no depth and all breadth (keynote 1) or all depth and no breadth (keynote 2).

She also talked about how Nimbus is used in real science, with a number of cool use cases and a good but brief discussion of how they run their open-source world. It was fairly simple – a few core committers on the project and a few more on github, but since it usually isn’t talked about too much in these settings, I found it to be insightful.

Top Paper from Day Three:

Burkhard Neidecker-Lutz, one of the conference’s program chairs, stepped in and gave a talk on a paper that the authors were unable to present themselves, on a framework for information and billing in the cloud. I found this paper to be unique not because of the actual paper itself – the slides in fact were blobs of text and somewhat inpenetrable, but what was interesting was how Burkhard was able to take a different group’s paper and not-so-great slides and really turn it around. He was able to use examples from other papers seen at the conference to really save that paper and make for a very interesting discussion, at the least. So my thanks go out to him to show that it can be done – others who took up presenting papers that weren’t theirs just read it real fast and ran for the hills, so this was good proof that it can be done the right way.

Wrapping Things Up

Unfortunately I couldn’t find a list of the papers online anywhere, but it looks like most of the papers can be found via our friend the Google. While mulling things over at the conference and in this hotel room, I also have a number of new interesting cloudy ideas, so stay tuned!

Big Data 2010 Workshop

Today Raj and I hit up the Computer History Museum for the Big Data Workshop and had a pretty good time. Here’s the lowdown on the sessions I attended:

  • Cassandra Explained: This session was jam-packed full of people and talked about how Cassandra is laid out as well as upcoming features. Interesting features that may be coming to Cassandra include vector clocks instead of timestamps (currently of the ‘long’ data type), possibly ditching SuperColumns (the consensus was that they’re very powerful but too confusing to developers), and including “sloppy quorums” (maybe more on this in a later post).
  • Glue: This session had a lot less people but was focused on the role of middleware in today’s database systems. A lot of talk went on about systems that use multiple databases concurrently (e.g., MySQL and Cassandra together) and the problems that can come up. Apparently the popular solution for problems like this is involves sticking datastore requests in a message queue (ActiveMQand RabbitMQ immediately come to mind). Definitely not a solution that I had in mind (nor anyone else that I talked to who that wasn’t in the session) but gives interesting food for thought.
  • Graph DBs: This session had a lot of people as well but I felt was a missed opportunity. It stayed very high-level the entire time and didn’t tell me any useful implementation info like Cassandra had. The primary questions that I ask about a new NoSQL datastore are typically of the form “How do I run it?”, “What is the replication model?”, and “What’s the relationship between nodes in the system?”, but unfortunately, none of these questions were answered. The talk started off nicely, with talk about high-level concepts, but devolved into things that were way to specific and not helpful to me as an actual developer.
  • The Babbage Machine: This was actually really cool. We heard a talk about the history involved and got to see it in action, which was really cool. After that, I wandered around the museum a bit and saw a picture of Garry Kasparov v. Deep Blue that was pretty close to this one:

Since I love chess and computers, I can understand just a small amount of the anguish / stress that Garry Kasparov is in during his epic battle with Deep Blue (of course I far from understand what’s actually going on in his head). However, it still made for a cool exhibit to see the whole history of chess and computers.

  • Limitations and Alternatives to MapReduce: This was actually a really small group, and just chatted about when to use MapReduce and what other technologies are appropriate. This was nice as well but could have been a bit more technical.

All in all, it was good times, and fun was had by all. I would have loved to caught Chris Anderson’s talk on going “beyond the cloud”, as the general consensus was that it was great, but seeing the Babbage Machine and wandering through the museum was totally worth it. I picked up some sweet books as well, so check back later for reviews on those.

AppScale - Now in HD!

A while ago I did a screencast on the basic features of AppScale, but the quality was just not what I was looking for. So when the opportunity to do a new screencast came up, I bought ScreenFlow (which I should have done from the start) and with it, slapped together a way better screencast! Enjoy!


This screencast covers features that are mostly in AppScale 1.3 and some that are planned for the 1.4 release:

  • Running an AppScale deployment: Largely the same as before, using appscale-run-instances.
  • Uploading additional AppEngine applications to an AppScale deployment: This is the same as usual, but as it shows here, the 1.4 release will allow you to upload folders in addition to .tar.gz files, using appscale-upload-app. The load balancer will also allow logged-in users to upload applications as well in AppScale 1.4.
  • Querying your AppScale deployment: Now shows information on the roles associated with each box (e.g., now you can see which box is running the load balancer, which is hosting your app, and so on), using appscale-describe-instances. The status page on the load balancer is also shown, which shows a lot of the same information but looks a little nicer in HTML.
  • Terminating your deployment: Same as before, using appscale-terminate-instances.

We also test out the Guestbook sample application and a new one, images-api-in-action, which is a simple demonstration of the Python Images API. Next time we’ll do a sneak peak of some other features that will appear in AppScale 1.4, so stay tuned!

AppScale 1.2 Released!

Just a quick blurb for now: the software my group works on, AppScale, has a new version out! It adds support for the Cassandra and Voldemort databases as well as the ability to deploy your Google App Engine apps to Amazon EC2, so check it out if that’s what you’re into! Here’s the changelog for the sake of completeness:

  • Now compatible with Python Google App Engine 1.2.3
  • Addition of support for Voldemort and Cassandra
  • MySQL bug fix allowing for parallel API nodes
  • Eucalyptus 1.5.2 support
  • Amazon AWS EC2 public image available (ami-7136d618)
  • Support for running Ubuntu Jaunty systems
  • Single node deployments (Cassandra or Voldemort only)
  • Ability to delete applications from a running deployment
  • Python2.6 for everything except Google App Engine (Python2.5)
  • Replication is configurable
  • Additional robustness and bug fixes

Major Area Exam

Just in case you were interested, I gave a talk last week at UCSB covering everything having to do with cloud computing. Unfortunately, I was only able to tape the first ten minutes of it, so I pretty much just cover virtualization and some basic introduction stuff as well. Here’s the slides if you want to follow along at home or see what I talked about after the video ended. Enjoy!


AppScale

Over the last few months we’ve been working away on something that we think is pretty cool, and just two weeks ago we finally released the first version of it (which was naturally followed up by another release to fix the bugs in the first). It’s something we call AppScale, a platform on which you can run Google App Engine apps. But how does it differ from the platform that Google gives you to run App Engine apps on your local computer or the platform they host it in? Let’s explore that together!

appscaleall-in-one-small

Before AppScale, there were two common deployment options for App Engine apps:

  1. Your local machine, primarily used for testing. Data is saved into a flat-file database and there’s no real support for authentication. For testing though, it’s perfectly fine, since that’s all you’re using it for.
  2. Google’s infrastructure. It’s a giant black box, but in all likelihood it’s storing data into BigTable. Authentication is easy since it uses Google Accounts for authentication, but now you’re at the mercy of Google.

The downsides of both these approaches (mostly the second though) led to the development of AppScale’s predecessor, AppDrop. AppDrop was created by J. Chris Anderson and is essentially a Ruby on Rails app that allows you to upload Google App Engine apps into it and have it run them. This was an excellent starting point, but we wanted to take it a step further. I looked through AppDrop, and with Chris’ help, I was able to get a good grasp on it. It’s not particularly complicated, but it was my first Rails app and I just didn’t know how Rails did things.

Before we can go a step further, we need to introduce the other critical piece of the puzzle: Eucalyptus. It’s an open-source Infrastructure-as-a-Service (IaaS) that is the lowest level of abstraction as far as cloud computing goes. It allows users to ask it for a certain number of virtual machines and spawns them up, returning the IPs of the boxes spawned up. If you’re familiar with Amazon’s EC2, then you pretty much already know Eucalyptus, since they’re API compatible (the EC2 tools work if pointed at Eucalyptus instead of EC2).

Simply put, combining AppDrop and Eucalyptus yields AppScale. AppScale is an open-source Platform-as-a-Service (PaaS) that sits on top of Eucalyptus and runs AppEngine apps. Instead of saving data into a flat-file database like Google offers you for testing, AppScale offers the option of using either HBase orHypertable. Both are open-source implementations of Google’s BigTable and rest on the Hadoop Distributed File System, an open-source implementation of the Google File System.

We have a technical report (pdf) detailing the specifics of the paper, so I’ll just gloss over it here. The base configuration of AppScale is pretty simple, containing four virtual machines. One node acts as a load balancer (the paper refers to it as the AppLoadBalancer) and the other three nodes host your AppEngine apps (similarly dubbed AppServers). Communication between the nodes is done over SOAP by a daemon that sits on each node, the AppController.

The big thing we’ve held off on at this point is the last half of our project’s name, the scaling part. We’ve seen quite a few metrics on when other products decide when is a good time to create more virtual machines and when to destroy them, but it’s not clear which metrics make the best indicators. So for now we’ve taken the simpler way out and only spawn up a static configuration. Over time we’ll investigate which metrics are best and report on them, and of course, since it’s open-source, anyone else can too! You can verify our numbers, make your own way of spawning more nodes, and so on. For now though, we keep it pretty simple and only measure CPU usage and memory usage, but you can rest assured that will grow to other metrics soon.

Of course we have many other ideas as well going forward. We have instrumented versions of Ruby and Python running in AppScale that can dump call trees and calling context trees to tell you which methods in your program called others (as well as how often) and allow you to find where time is being spent in your program. Finding a way to report this back to the user in a simple, aggregated way is something we’re looking at. But since I don’t want to give too much away, I won’t spoil the surprise of other things we’re working on.

So if you happen to have a few machines lying around, give it a try! If you already have Eucalyptus installed, you should be able to get AppScale up and going in no time at all. Let us know what you think about it and what you need from it to make it a viable platform! Either way, be sure to check back next time for our thanks to the various pieces of software that make up AppScale and how they got it working for us.

HighScalability Kicks Ass, and a look forward

For those of you who haven’t checked it out yet, highscalability.com reports on a lot of the fun wacky technologies involved in making websites you’ve all come to know and love and how they can scale to the ridiculous amount of traffic that comes with it. Pretty much all the articles keep track of rebuttals to their points all on the same page so it ends up being a fun way to dive into some new tech you’ve never seen before. They also do a good job of scouring the net and aggregating good distributed systems stuff together in one place. My favorites so far:

Great introductions and notes about database sharding with relational databases:

An Unorthodox Approach to Database Design : The Coming of the Shard

Scalability Strategies Primer: Database Sharding

A great collection of notable papers in distributed systems:

Readings in Distributed Systems

And in the world of the cloud:

Anti-RDBMS: A list of distributed key-value stores

Paper: MapReduce: Simplified Data Processing on Large Clusters

So as the title states, this is a look forward. Expect to be hearing about all of these again sooner rather than later (in no particular order).

Python / Ruby SOAP Greatness

While working on our multi-language SOAP project, we’ve run into a number of interesting quirks. If you end up doing the same, you should certainly try out a number of “base cases” to make sure you get a good grip on it. For example, in Ruby, you can pass strings back and forth with no problem over SOAP (presumably this works the same way for Python-only communication).

Our problem comes in that one computer runs a Ruby SOAP server/client, and the other runs a Python SOAP server/client. We’ve seen that if the Python server returns the empty string “”, then the Ruby client will see it as a SOAP object (instead of just “”). It works fine for any other string, just not the empty string.

The problem is that this isn’t something you would have immediately recognized beforehand. This may be due to the fact that we’re using dynamic languages (perhaps we wouldn’t have this issue in Java), but as I haven’t tested it, I don’t know.

Anywho, that’s all for today! Make sure you start small and easy and once you’ve got the base assumptions about new technologies worked out, build on up from there. Happy coding!

Eucalyptus

A huge amount of buzz in the internets and especially cloud-computing land is about Amazon EC2. With EC2 you can go pay Amazon some money and get a nice little virtual computer with its own IP and all that fun stuff and throw up your web site on it. Other cloud computing vendors offer software that runs on it to make sure the apps you put on EC2 stay up no matter what (e.g., put a web site on it and make sure that no matter how much traffic it gets, it’s still able to stay functional).

But what if you wanted an open-source alternative? Enter Eucalyptus.

Disclaimer: Since Eucalyptus is a UCSB product and I’m at UCSB, I’m not entirely unbiased. But presumably you realize I’m biased and to some extent, you are too.

Eucalyptus for people who have a cluster and want to run it in the same fashion that Amazon EC2 is done. To do so, it’s API-compliant, meaning that the tools you use to talk to EC2 work exactly the same on Eucalyptus.

Eucalyptus puts a pretty front end around your cluster and virtualization tools in (presumably) a similar fashion as Amazon EC2. Virtualization is done via Xen but since they use the libvirt library, which claims to be virtualization-agnostic, you’ll be able to use other tools down the line.

You can see all this info and more in a presentation Rich Wolski gave at Velocity a few months ago, but my initial observations come from a different angle. I had wanted to try out EC2 for quite some time but wasn’t sure how much I’d have to pay to try it out. With Eucalyptus, if you happen to have a few (relatively recent) boxes lying around you can be up and going in no time. For free.

And the whole thing has been a giant learning experience for me. Learning about Xen, making images, all that stuff has been an awesome time. I’ve got a few images I’m going to upload to Eucalyptus soon and fiddle with, and it’s something I definitely recommend doing if you’ve got some spare time on your hands and a few boxes.

If you don’t, and you have some cash lying around (how much is uncertain to me), give EC2 a try. Let me know how much it takes you to get up and going and how it is.

profile for Chris Bunch at Stack Overflow, Q&A for professional and enthusiast programmers