Byzantine Reality

Searching for Byzantine failures in the world around us

Porting Over to Octopress

So for quite a while I’ve been in a blogging-funk. But with fellow AppScale co-founder Navraj Chohan blogging on an almost daily basis, I figured I would try to start blogging again. With that, I spent some time lately upgrading my blog from bloog to Octopress. But why, you say? Read on to find out!

My blog’s been hosted on Google App Engine for the last few years and I’ve been mostly happy with it. The pros include:

  • No server management
  • No software management
  • Pay for only what you use (which in my case was nothing, instead of the ~$200 I was paying on a VPS to run Wordpress)

However, those awesome pros came with one nasty con:

  • No Wordpress. As of June 2013, you now can run Wordpress on App Engine, with this tutorial

I was left to find a way to run a piece of blog software that fit the constraints of the App Engine programming model. Originally I wanted to write my own, but after stumbling upon bloog, I decided to use it instead. It’s served me very well for the last few years, but a few annoying bugs made it a hassle to use:

  • Any HTML would often get rendered incorrectly. For example, code blocks would often number each line starting with 1, instead of the actual line number.
  • Images would have to be hosted elsewhere (like imgur).

This made it just annoying enough to prevent me from getting really excited about blogging. So when I heard about App Engine supporting PHP and Wordpress, I gave it a look. However, when I realized that I could use Octopress and it would likely be much faster, I pivoted over to it instead. Now I’m running entirely on Octopress, hosted on App Engine. To do so, I simply forked Octopress and dumped this app.yaml file in the top-level directory:

app.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
application: byzantinereality
version: 1
runtime: python27
api_version: 1
threadsafe: yes

handlers:
- url: /(.*)/
  static_files: public/\1/index.html
  upload: public/(.*)/index.html

- url: /
  static_files: public/index.html
  upload: public/index.html

- url: /
  static_dir: public

skip_files:
- source

This App Engine config file tells the App Engine runtime that we want to host only a bunch of static files, and to not upload any of our sources (since they don’t actually get hosted). I then had to handle porting over my old blogs from the bloog format (stored in the App Engine Datastore) to Markdown. To do so, I tried using the App Engine bulkloader. It downloaded my data as a sqlite3 database, but I was unable to determine where it actually stored my data. I then tried storing it as XML or CSV, but the bulkloader documentation wasn’t helpful in letting me know what parameters it needed and how to specify them.

So I fell back on just scraping the HTML from my blog, and for each page that I saved, I ran this script on it, written in Ruby and using the awesome Nokogiri gem:

converter.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
require 'nokogiri'

files = Dir.glob("old-blog/*.html")
files.each { |file|
  # First, open our page full of blog posts.
  contents = File.open(file) { |f| f.read() }
  doc = Nokogiri::HTML(contents)

  doc.css("div.post").each { |post|
    # Get the date of the post.
    date = post.at_css("span.date").text.strip

    # Get the name of the post.
    title = post.at_css("h2 a").text.strip

    # Remove any non-alphanumerics, so that it's octofriendly.
    sanitized_title = title.gsub(/\s+/, "-")
    sanitized_title.gsub!(/[^\w\d-]/, "")
    title_without_dashes = title.gsub(/"/, "")

    # Get the content of the post.
    content = post.at_css("div.entry").to_html

    # Next, write an octopress-friendly post with this info.
    octoname = "source/_posts/#{date}-#{sanitized_title}.markdown"

    octocontent = <<QUUX
---
layout: page
title: "#{title_without_dashes}"
date: #{date} 12:00
comments: false
sharing: true
footer: true
---
#{content}
QUUX

    File.open(octoname, "w+") { |f| f.write(octocontent) }
  }
}

This doesn’t truly convert my HTML to markdown, but it’s good enough for right now. To improve the performance of my app, I also had a cron job set up in App Engine that would hit the main page (the / route) every minute. This was cool (since I didn’t have to pay for resident instances), but meant that I was always using 24 instance hours a day, regardless of how many people actually showed up. Here’s what my instance usage looks like at 10am yesterday (which was typical on bloog):

And here’s what it looks like now that it’s all hosted as static files, at 3pm today (which is typical on Octopress):

Much better! I also don’t need the cron job anymore, since there are no instances I need to keep live (since using static files in App Engine doesn’t actually spin up instances).

Octopress is actually really awesome. I was up and going with it in minutes, and since it’s just static HTML, I don’t have to worry about the pain of porting somewhere else if I need to (since there’s no proprietary data format or database schema that I have to take with me). I definitely recommend checking it out!