Byzantine Reality

Searching for Byzantine failures in the world around us

Placement Support in AppScale

As of AppScale 1.4, cloud administrators can now manually specify which machines should host each of the critical services in AppScale. This post outlines how to use this service to explore alternative deployment schemes that may give your user’s applications potentially better performance or make it more fault tolerant.
This post, with some few changes, is a cross-post, from the document I wrote on the AppScale Google Code page.

What are the critical services? In order to host your applications, AppScale employs the following components:

  • Load balancer: The Ruby on Rails web service that routes users to their Google App Engine applications. It also hosts a status page that reports on the state of all machines in the currently running AppScale deployment.
  • App Engine: Our modified version of the non-scalable Google App Engine SDKs that adds in the ability to store data to and retrieve data from databases that support the Google Datastore API.
  • Database: Runs all the services needed to host the chosen database.
  • Login: The primary machine that is used to route users to their Google App Engine applications. Note that this differs from the load balancer – many machine can run load balancers and route users to their applications, but only one machine is reported to the cloud administrator when running appscale-run-instances and appscale-upload-app.
  • ZooKeeper: Hosts metadata needed to implement database-agnostic transaction support.
  • Shadow: Queries the other nodes in the system to record their state and ensure that they are still running all the services they are supposed to be running.
  • Open: The machine runs nothing by default, but the shadow machine can dynamically take over this node to use it as needed.

What happens by default? The default deployment employs an ips.yaml that resembles the following:

ips.yaml
1
2
3
4
5
6
---
:controller: 192.168.1.2
:servers:
- 192.168.1.3
- 192.168.1.4
- 192.168.1.5

This deployment employs a controller and a number of servers. These “aggregate roles” each run a number of roles in the system:

  • Controller: Shadow, load balancer, database, login, and ZooKeeper
  • Servers: App Engine, database, and load balancer
  • Master (not shown here): Shadow, load balancer, and ZooKeeper

How do I specify where they run? It’s simple! Change your ips.yaml to specify exactly what machines should run which services in your deployment. Here’s an example:

ips.yaml
1
2
3
4
5
6
7
---
:master: 192.168.1.2
:appengine:
- 192.168.1.3
- 192.168.1.4
:database:
- 192.168.1.5

Also, as no machine has been specified as the login node, the master node automatically takes up this role. Therefore, one node (192.168.1.2) routes users to their App Engine applications, hosted at 192.168.1.3 and 192.168.1.4. Furthermore, in this style of deployment, these nodes only host App Engine applications, and not also databases as was the case in the standard deployment. Finally, one machine (192.168.1.5) hosts the database in the system.

Let’s look at another example:

ips.yaml
1
2
3
4
5
6
7
8
9
10
---
:master: 192.168.1.2
:appengine:
- 192.168.1.3
- 192.168.1.4
:database:
- 192.168.1.3
- 192.168.1.4
:login:
- 192.168.1.5

In this example, one node (192.168.1.5) routes users to their App Engine applications and performs no other functions. Two nodes (192.168.1.3 and 192.168.1.4) host App Engine applications and host the chosen database. Finally, one node (192.168.1.2) queries the other nodes in the system to ensure they are running properly and handles transactions via ZooKeeper.

Using Placement Support in Cloud Deployments

But how do you use this placement support in Eucalyptus and Amazon EC2? It’s simple – just replace each of the IPs in your ips.yaml with node-X (where X is an integer). Here’s an example using the standard deployment:

ips.yaml
1
2
3
4
5
6
---
:controller: node-1
:servers:
- node-2
- node-3
- node-4

And here’s the second example again, but modified for use on cloud infrastructures:

ips.yaml
1
2
3
4
5
6
7
---
:master: node-1
:appengine:
- node-2
- node-3
:database:
- node-4

And for completeness purposes, here’s the third example once more:

ips.yaml
1
2
3
4
5
6
7
8
9
10
---
:master: node-1
:appengine:
- node-2
- node-3
:database:
- node-2
- node-3
:login:
- node-4

A Note About Databases

Some databases (Cassandra, Voldemort) run in a peer-to-peer fashion, so all database nodes are considered equal. But others employ some type of master-slave relationship – how then do we specify which node is the database master and which are the database slaves? That’s simple – the first database node specified is always the database master.

Impact of Placement Support on Performance and Fault Tolerance

As AppScale is designed to be a platform for experimentation, this support allows cloud administrators to possibly gain better performance (or worse performance) as well as more or less fault tolerance in their system. Let’s examine this through a familiar example:

ips.yaml
1
2
3
4
5
6
7
---
:master: node-1
:appengine:
- node-2
- node-3
:database:
- node-4

Here, performance is likely to be better under lower load due to having only one database – many of the internal agreement protocols are vastly simplified when only one node is involved. However, this node is now a single-point-of-failure in the system – if it goes down, users won’t be able to read or write data.

Let’s look at another familiar example, the default deployment:

ips.yaml
1
2
3
4
5
6
---
:controller: node-1
:servers:
- node-2
- node-3
- node-4

This deployment gives us four database nodes (one for each node in the system) and three App Engine nodes – giving us vastly better fault tolerance than in the previous deployment. Of course, this is only with respect to App Engine nodes and database nodes – the shadow (specified as master in the previous example and controller in this example) is still a single-point-of-failure. Ongoing work is looking to alleviate this problem.

Wrapping Things Up

This document outlines a number of ways by which cloud administrators can manually specify where AppScale’s critical services should run. Explore the various deployment options that are now available, let us know what you’re using, and enjoy!