Byzantine Reality

Searching for Byzantine failures in the world around us

Two Caching Strategies for App Engine Apps

Recently I took on a redesign of an old project of mine: Active Cloud DB. It’s a Python App Engine app that exposes a REST API to the Datastore, allowing clients of any programming language to access Google’s scalable key-value datastore. However, the web frontend didn’t look too hot, and when I saw what Bootstrap could do, I knew I could use it to do justice for Active Cloud DB. So I did just that, and made a new Active Cloud DB with Go App Engine and Bootstrap, and boy does it look a lot nicer now. Of course it’s all open source, so feel free to grab the code we’ll be talking about today and follow along at home.

Active Cloud DB provides a minimalist REST API for the Datastore, exposing four of the Datastore API’s operations: get, put, delete, and query. To speed up these operations, we also throw in two types of caching via the Memcache API: generational caching and write-through caching. The latter is the more familiar, so let’s go over that one first. With write-through caching, write operations (put and delete) hit both Memcache and the Datastore, while read operations (get) read the Memcache version, and only hit the Datastore if the data isn’t found in Memcache. Here’s what the code for a put looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
func put(w http.ResponseWriter, r *http.Request) {
  keyName := r.FormValue("key")
  value := r.FormValue("val")

  c := appengine.NewContext(r)

  key := datastore.NewKey("Entity", keyName, 0, nil)
  entity := new(Entity)
  entity.Value = value

  result := map[string] string {
    "error":"",
  }
  if _, err := datastore.Put(c, key, entity); err != nil {
    result["error"] = fmt.Sprintf("%s", err)
  }

  // Set the value to speed up future reads - errors here aren't
  // that bad, so don't worry about them
  item := &memcache.Item{
    Key: keyName,
    Value: []byte(value),
  }
  memcache.Set(c, item)
  bumpGeneration(c)

  fmt.Fprintf(w, "%s", mapToJson(result))
}

And here’s what the code for a get looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
func get(w http.ResponseWriter, r *http.Request) {
  keyName := r.FormValue("key")

  c := appengine.NewContext(r)

  result := map[string] string {
    keyName:"",
    "error":"",
  }

  if item, err := memcache.Get(c, keyName); err == nil {
    result[keyName] = fmt.Sprintf("%q", item.Value)
    fmt.Fprintf(w, "%s", mapToJson(result))
    return
  }

  key := datastore.NewKey("Entity", keyName, 0, nil)
  entity := new(Entity)

  if err := datastore.Get(c, key, entity); err == nil {
    result[keyName] = entity.Value

    // Set the value to speed up future reads - errors here aren't
    // that bad, so don't worry about them
    item := &memcache.Item{
      Key: keyName,
      Value: []byte(entity.Value),
    }
    memcache.Set(c, item)
  } else {
    result["error"] = fmt.Sprintf("%s", err)
  }

  fmt.Fprintf(w, "%s", mapToJson(result))
}

The careful reader will have noticed that I haven’t talked about the query operation at all. This is because the query operation is a bit more complex than the others. The other operations specifically indicate which key they’re operating on, while an arbitrary query can operate on any number of items. So to cache query operations, we employ a generational caching strategy. Essentially, we set a generation value in the Datastore (an integer), and whenever a query is performed, we associate the current generation value with it and store the result in Memcache. So a query for SELECT * from Entity performed on an initially empty database (with generation value 0) could be stored with the key SELECT * FROM Entity / 0. Whenever a write is performed (a put or delete), we increment the generation value, which means that when we do a query, we’ll be looking for SELECT * FROM Entity / 1. That implicitly invalidates all old queries and ensures we don’t get any stale query data. In our implementation, we are only concerned with a single query right now, so we simplify how we store that key, but in general it should work fine. The code for retrieving queries thus looks as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
func query(w http.ResponseWriter, r *http.Request) {
  c := appengine.NewContext(r)

  cacheKey := getCacheKey(c)
  if item, err := memcache.Get(c, cacheKey); err != memcache.ErrCacheMiss {
    fmt.Fprintf(w, "%s", item.Value)
    return
  }

  q := datastore.NewQuery("Entity")
  result := map[string] string {}
  for t := q.Run(c); ; {
    var entity Entity
    key, err := t.Next(&entity)
    if err == datastore.Done {
      break
    }
    if err != nil {
      result["error"] = fmt.Sprintf("%s", err)
    }
    keyString := fmt.Sprintf("%s", key)
    result[keyString] = entity.Value
  }

  jsonResult := mapToJson(result)
  item := &memcache.Item{
    Key: cacheKey,
    Value: jsonResult,
  }
  memcache.Set(c, item)

  fmt.Fprintf(w, "%s", jsonResult)
}

With that, you now know how to use Memcache to cache Datastore accesses in Go. Of course, see our CloudComp paper for more details on the Python implementation and an evaluation. I hope that piqued your interest in Go and App Engine, so get coding!