Byzantine Reality

Searching for Byzantine failures in the world around us

Logging in AppScale

Logging is one of those things that programmers take for granted. We rely on logging to bring important debugging and error info to the forefront, and that’s no different in web applications. One particularly nice thing about working at the platform-as-a-service layer is that it can control your runtime, allowing it to automatically collect your logs and display them to you. In AppScale, our upcoming 1.7.0 release takes this a step further, allowing you to retrieve both application-level logs and system-level logs via both a web or command-line interface. This post details how AppScale implements this support for arbitrary Google App Engine applications, what the interfaces look like that users interact with, and how other services can leverage this support.

Storing and Querying Logs via the AppDashboard

Let’s start off by looking at how we can store and retrieve logs. Ideally, we need the following:

  • A replicated database, with the ability to query based on the name of the service that stores logs or the hostname of the machine they came from.
  • A caching layer in front of the database, to speed up repeated accesses to this data.
  • A simple web programming framework, to enable users and applications to access the data.
  • Authentication, to ensure that only the right users can see the data.

By no small coincidence, these are all things that the Google App Engine APIs offer! We can use NDB to store and retrieve our logs, which automatically leverages the the Memcache API to speed up accessing our log data. Programming Python 2.7 apps on Google App Engine is a breeze with webapp2, and we can use the Google App Engine Users API if we need authentication when viewing logs. We thus use webapp2 to expose a single route in our AppDashboard, at /logs/upload, that accepts POST requests to store log data. That code performs the following task:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def post(self):
  """ Saves logs records to the Datastore for later viewing. """
  encoded_data = self.request.body
  data = json.loads(encoded_data)
  service_name = data['service_name']
  host = data['host']
  log_lines = data['logs']

  # First, check to see if this service has been registered.
  service = LoggedService.get_by_id(service_name)
  if service is None:
    service = LoggedService(id = service_name)
    service.hosts = [host]
    service.put()
  else:
    if host not in service.hosts:
      service.hosts.append(host)
      service.put()

  # Next, add in each log line as an AppLogLine
  for log_line_dict in log_lines:
    the_time = int(log_line_dict['timestamp'])
    reversed_time = (2**34 - the_time) * 1000000
    key_name = service_name + host + str(reversed_time)
    log_line = RequestLogLine.get_by_id(id = key_name)
    if not log_line:
      log_line = RequestLogLine(id = key_name)
      log_line.service_name = service_name
      log_line.host = host

   app_log_line = AppLogLine()
   app_log_line.message = log_line_dict['message']
   app_log_line.level = log_line_dict['level']
   app_log_line.timestamp = datetime.datetime.fromtimestamp(the_time)
   app_log_line.put()
   log_line.app_logs.append(app_log_line)
   log_line.put()

Here, our POST request expects a JSON-encoded body that contains the name of the service posting one or more logs (e.g., “guestbook”, “appcontroller”), the hostname of the machine posting the logs, and the actual logs themselves. We then store these logs with a key that decreases as time increases, so that we can do a SELECT * query to get our logs sorted from newest to oldest (pretty handy when viewing logs). So that’s how we store logs in the Datastore – let’s look at how our modified AppServers send them to the AppDashboard.

Logging in Python App Engine apps

The Python AppServer makes it pretty simple to send all the logs for a given request all at once. In logservice.py, we can see that all the logs for a request get batched up together, so we just send them off to our AppDashboard as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def _flush(self):
  """Internal version of flush() with no locking."""

  logs = self.parse_logs()

  appid = os.environ['APPLICATION_ID']
  if appid in ['apichecker', 'appscaledashboard']:
    return

  formatted_logs = [{'timestamp' : log[0] / 1e6, 'level' : log[1],
    'message' : log[2]} for log in logs]

  payload = json.dumps({
    'service_name' : appid,
    'host' : self.get_my_public_ip(),
    'logs' : formatted_logs
  })

  conn = httplib.HTTPSConnection(self.get_login_ip() + ":443")
  headers = {'Content-Type' : 'application/json'}
  conn.request('POST', '/logs/upload', payload, headers)
  response = conn.getresponse()
  self._clear()

So here, we skip sending the logs if we’re in the API Checker app (which gets started before the AppDashboard and thus can’t send logs to it) or the AppDashboard itself (which would create an infinite loop if it tried sending logs to itself, because that process causes logs to get generated). We then grab out the log level, message, and timestamp for each log, and package it up in the nice format described above. We ship that off to the AppDashboard and call it a day! This results in a single HTTPS connection per request, unless the buffer that the logs are stored in gets too large (in which case its one HTTPS connection per flush).

Logging in Java App Engine apps

Shipping off logs in the Java AppServer happens in a very similar fashion as in the Python AppServer, with the main difference being that the Java AppServer doesn’t buffer up all the log lines for a single web request. That means we have to send them off one at a time, which does not perform as nicely as the Python version, but is not the end of the world. The code looks pretty similar in LocalLogService.java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
public synchronized void addAppLogLine(String requestId, long time, int level, String message) {
  if (message == null) {
    return;
  }

  LogServicePb.LogLine line = new LogServicePb.LogLine();
  line.setTime(time);
  line.setLevel(level);
  line.setLogMessage(message);

  // Send the log to the Admin Console for later viewing
  HashMap<String, Object> logHash = new HashMap<String, Object>();
  logHash.put("timestamp", time / 1e6);
  logHash.put("level", level);
  logHash.put("message", message);
  HashMap<String, Object>[] logList = new HashMap[1];
  logList[0] = logHash;

  Gson gson = new Gson();
  HashMap<String, Object> data = new HashMap<String, Object>();
  data.put("service_name", System.getProperty("APPLICATION_ID"));
  data.put("logs", logList);

  InputStream inStream = null;
  BufferedInputStream buf = null;
  InputStreamReader inStreamReader = null;
  BufferedReader bufferedReader = null;
  Runtime r = Runtime.getRuntime();
  String result = null;
  String base64Data = DatatypeConverter.printBase64Binary(gson.toJson(data).getBytes());
  // finally, send the logs!
  // omitted for space concerns

This Java code once again packages up the log level, message, and timestamp, sending it off to the AppDashboard. As each Python or Java App Engine app uniquely identifies itself via its appid, we can use it to pass logs from any App Engine app to the AppDashboard and not have to worry about anything else!

Logging in the AppController

The above scheme is great for web service applications, but we thought it would be great if other services could leverage the automation that this system provides. With that, we expand our dispatching service, the AppController, to also send its logs to the AppDashboard. We augment our logger to batch up its logs into an Array, which it can then send to the AppDashboard:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Sends all of the logs that have been buffered up to the Admin Console for
# viewing in a web UI.
def flush_log_buffer()
  APPS_LOCK.synchronize {
    encoded_logs = JSON.dump({
    'service_name' => 'appcontroller',
    'host' => my_node.public_ip,
    'logs' => @@logs_buffer,
  })

  url = URI.parse("https://#{get_login.private_ip}/logs/upload")
  http = Net::HTTP.new(url.host, url.port)
  http.use_ssl = true
  response = http.post(url.path, encoded_logs,
    {'Content-Type'=>'application/json'})

  Djinn.log_debug("Wrote #{@@logs_buffer.length} logs!")
  @@logs_buffer = []
 }
end

This code is fairly straightforward – just like before, it grabs all the logs (which have been nicely batched up for us), gets them into the right JSON format for the AppDashboard, and ships them off. As the AppController is service-aware, it batches up logs until the AppDashboard is started, and then ships them off. This means that when AppScale starts up, there’s initially about 200 logs batched up, but once AppScale gets to a steady state, it sends only about half a dozen log messages every 30 seconds. We’re working on optimizations here to not send all these logs at once, but like in the Python AppServer, split them up into batches.

Conclusion and Future Work

This covers how the Python and Java AppServers send their logs to the new AppDashboard web UI, as well as how we extend this to also include support for our AppController daemon. In the future, we’re looking to expand this to include other AppScale system services, like our database-agnostic layer (AppDB). We’d love to have a hand on it, so feel free to join us in #appscale on freenode.net and let us know what you think!