Byzantine Reality

Searching for Byzantine failures in the world around us

How AppScale Implements Transaction Support

AppScale provides support for pluggable databases, allowing users to run their Google App Engine applications with Cassandra as a backing store, and then to switch over to the same app running on HBase or Hypertable. To provide this pluggable database support, we provide a layer of abstraction between the App Server (which can be written in Python, Java, or Go) and the databases, which we call AppDB. This post explains how AppDB provides support for transactions while providing pluggable database support, and how we have extended it to support cross-group (XG) transactions in Google App Engine.

The Programming Model

The Google App Engine Datastore API lets users write Python, Java, or Go code to save and retrieve data. Let’s start off with an example. First, we define a Model, representing a class we want to store in the Datastore:

1
2
class BlogComment(db.Model):
 text = db.StringProperty(required=True)

Here, our BlogComment has only one field, text, which is a string. This isn’t a new concept – ORM has been around for a while now, which is why it’s nice that Google took this on instead of creating something drastically new. You can make a new BlogComment in your app by typing:

1
comment = BlogComment(key_name=key, text="hello")

And this instantiated object is referred to as an Entity. Entities are organized into Entity Groups. Standard transactions within Google App Engine operate within a single entity group, so you could make an Entity Group for each BlogPost and put all the BlogComments for that post in that Group. Then, you could add and delete BlogComments for a single BlogPost within a transaction, but you can’t operate on BlogComments across two BlogPosts within one transaction.

Now that you know what transactions look like to users, let’s talk about how we make them work in AppScale.

Transaction Support

In AppScale, we want to provide transactions with the same semantics that Google App Engine provides, but for any database that our database-agnostic layer (AppDB) supports. To do this, we need to be able to atomically acquire and release locks, so we leverage ZooKeeper to do this for us.

Let’s look at a transaction a user writes and see how this gets converted to calls to our database-agnostic layer. Suppose we have a transaction that retrieves a comment, changes its value, and stores it back in the Datastore:

1
2
3
4
5
6
def boo():
  comment1 = BlogComment.get_by_key_name("key1", blog_post)
  comment1.text = "new text"
  comment1.put()

  db.run_in_transaction(boo)

This transaction gets reduced into the following steps:

  • BeginTransaction (called when run_in_transaction starts)
  • Zero or more Puts, Gets, Queries, or Deletes (called by the user’s function).
  • CommitTransaction (called when run_in_transaction ends)

Let’s break down each of those steps from the AppDB point of view.

Step 1: BeginTransaction

Transactions are identified by the entity group that they operate on. AppDB begins by asking ZooKeeper for a sequential node (a node whose name ends in a monotonically increasing ID) with the following path:

1
/appscale/apps/appid/txn_ids/tx001

where appid is the name of the application we’re running the transaction for, and tx001 means this is the first transaction being performed (the 001 is sequentially given to us by ZooKeeper).

Step 2: Put, Get, Query, Delete

BeginTransaction sets up the ZooKeeper path for the transaction, but doesn’t actually acquire any locks. We acquire locks in response to puts, gets, queries, and deletes. When a put, get, query, or delete happens, AppDB looks at the entity group the operation is occurring on. If we don’t have the lock for that entity group, we check the “lock path” for the transaction, located at:

1
/appscale/apps/appid/txn_ids/tx001/lockpath

If the lock path doesn’t exist (which it doesn’t for the first operation), then we create it and set its value to the entity group we’re operating on. If the lock path does exist, we look at its value and see if it’s the same as the entity group we’re operating on. If it is, we have the entity group locked and can safely operate on it. If it isn’t the same, then that means we’re trying to operate on more than one entity group, which isn’t allowed in the standard transaction model, so we rollback the transaction and abort it.

In our example, the get_by_key_name will call get, which causes the lock path to be created and set to key1. When the put happens, we look at the lock path, see that it exists and is set to key1, and thus proceed with the put operation.

Step 3: CommitTransaction

Finally, the last step of a db.run_in_transaction is to commit the transaction. This is essentially the opposite of the BeginTransaction step, so we clean up all the transaction state we created earlier. We start by deleting the lock path from our transaction, as well as the sequential node we created earlier. Presuming that those delete operations succeed, we’re good to go!

Omitted Details

For the sake of brevity and clarity, this example assumes that everything succeeded without any problems. We do implement rollback and transaction ID blacklisting for scenarios when there are problems acquiring ZooKeeper locks, or when a transaction tries to touch multiple entity groups. For the interested, we refer them to our detailed writeup on the transaction system.

Extending this for Cross-Group Transactions

Now that we’ve shown how AppScale implements transaction support within a single entity group, let’s look at how we’ve expanded it to work on multiple entity groups (XG).

Let’s look at a cross-group transaction a user writes and see how this gets converted to calls to our database-agnostic layer. Suppose we have a transaction that gets two comments, for two different blog posts, sets their values, and stores it back in the Datastore:

1
2
3
4
5
6
7
8
9
10
11
def boo():
  comment1 = BlogComment.get_by_key_name("key1", blog_post1)
  comment1.text = "new text 1"
  comment1.put()

  comment2 = BlogComment.get_by_key_name("key2", blog_post2)
  comment2.text = "new text 2"
  comment2.put()

  xg_on = db.create_transaction_options(xg=True)
  db.run_in_transaction_options(xg_on, boo)

Like before, this transaction gets reduced into the following steps:

  • BeginTransaction
  • Zero or more Puts, Gets, Queries, or Deletes
  • CommitTransaction

Let’s break down what we change in AppDB to support cross-group transactions.

Step 1: BeginTransaction

This step is mostly the same as before, but after we create the transaction path (with the sequential ID), we look in the BeginTransaction request and see if the user has specified xg=True. If they have, we create a ZooKeeper node at the following path and set its value to True:

1
/appscale/apps/appid/txn_ids/tx001/is_xg

Step 2: Put, Get, Query, Delete

Datastore operations occur very similarly as before, but now, instead of there being a lock path, we change it to be a “lock list path”, which instead of being a pointer to one entity group, is now a list of pointers to entity groups. The new path is called:

1
/appscale/apps/appid/txn_ids/tx001/lock_list_path

Like before, if the lock path doesn’t exist (which it doesn’t for the first operation), then we create it and set its value to the entity group we’re operating on. If the lock path does exist, we look at its value and see if it’s the same as the entity group we’re operating on. If it is, we have the entity group locked and can safely operate on it. If it isn’t the same, then we look at the is_xg node we set earlier. If it doesn’t exist, we’re not in a XG transaction and this isn’t allowed, so we abort. If it does exist and is set to True, then we are in an XG transaction. We then look at the number of locks in the lock list and see how many locks have been acquired (Google App Engine limits you to 5 locks for XG transactions). If acquiring this lock would push us over the limit, we abort the transaction. Otherwise, we add it to the lock list and write the new list back to ZooKeeper.

In our example, the first get_by_key_name will call get, which causes the lock path to be created and set to key1. When the first put1 happens, we look at the lock list path, see that it exists and is set to key1, and thus proceed with the put operation. When the second get_by_key_name happens, we see that our entity group key2 isn’t in the lock list, but since xg=True is set, that’s ok and we add key2 to the lock list and proceed. The second put occurs, and we see that key2 is in the lock list, so we proceed.

Step 3: CommitTransaction

This step is also similar to the non-XG version, but here, instead of deleting the lock path, we delete the lock list path. Presuming that those delete operations succeed, we’re good to go!

Conclusion

So that’s a quick writeup on how transactions work in AppScale. For the adventurous who are looking to operate on more than 5 entity groups at a time, check out appscale/AppDB/zkappscale/zktransaction.py and look for:

1
MAX_GROUPS_FOR_XG = 5

and change that to your heart’s content :)