February 26th, 2009 chris
Now that our coverage on the Big 5 Google papers has concluded, I thought it may be handy to aggregate the links to them all in one handy spot. That being said, here’s the links to our series of reviews (sorted by post date / chronological publishing of paper):
- Thoughts on the Google File System: How one constructs a fault-tolerant file system with which to build everything else on that performs optimally on huge data sets.
- Thoughts on MapReduce: How to write programs to process huge amounts of data as quickly as possible given the huge amount of resources available.
- Thoughts on BigTable: How to store data in a more structured format yet keep replication and high speed processing.
- Thoughts on Chubby: How to locate specific nodes serving a specific purpose though the use of a highly available naming service.
- Thoughts on Paxos: How to implement a consensus algorithm to ensure agreement amongst nodes in the presence of failures and keep high performance in the meanwhile.
So check them out if you haven’t, and be sure to come back next time for our review of Amazon’s Dynamo!
Posted in Uncategorized | No Comments »
February 26th, 2009 chris
After a month of Google papers, we finally come to the last paper in our Google reading list: Paxos Made Live. Like the Chubby paper from last week, this paper doesn’t present any new research, but instead tells us how to build a system from the current research out there such that it can withstand incredible load. It also aims to show how we could possibly be confident in a non-trivial amount of code as well as optimizations that can be made to improve the performance and reliability of the system. So how did they do and how did it turn out? Read on!
Read the rest of this entry »
Posted in Uncategorized | No Comments »
February 20th, 2009 chris
The last few times around we’ve been talking about performance. Extreme performance. We’ve talked about how fast the Google File System is, how fast MapReduce is, and how fast BigTable is. Yet all three rely on a common service we hinted upon last time, a lock service (amongst other things) named Chubby, the topic of today’s discussion. Chubby doesn’t bill itself as high performance like the other services we’ve mentioned, but is designed with high availability in mind. Let’s dive in further and see what exactly makes up Chubby and why it’s so special in the Google world.
Read the rest of this entry »
Posted in Uncategorized | No Comments »
February 13th, 2009 chris
Now that we’ve looked at the Google File System and MapReduce, we come to the next paper on our reading list: BigTable. It comes to us two years after the MapReduce paper and presents an interesting new step in databases. Like MapReduce, it is another notable paper and takes a very familiar programming construct and revamps it to new heights. Let’s see if I can whet your appetite with a spoiler:
Maybe a better name for it is Big Hash Table.
Read the rest of this entry »
Posted in Uncategorized | 1 Comment »
February 6th, 2009 chris
Last time we looked at the Google File System paper. Next on the list chronologically brings us to the famous MapReduce paper. This is the paper that drew my attention more than any other, as I heard about it years ago but never really got a chance to thoroughly review it. This work in a lot of ways is in fact the polar opposite of GFS. Whereas GFS is very low-level, being a file system and all, MapReduce is the exact opposite. It provides excellent results on an extremely constrained set of problems, much like GFS, but instead of being aimed at files, is intended for programmers. And like GFS, the MapReduce paper has been out for a few years, so let’s see what’s been done with it and where it’s future could be going.
Read the rest of this entry »
Posted in Computer Science | 7 Comments »
February 4th, 2009 chris
The first paper on the extremely interesting list of distributed systems readings is the Google File System. It is not a particularly new paper, published way back in 2003 at the Symposium on Operating Systems Principles (SOSP), but is still a great read. I’ve chosen to start with this paper over the other, more recent papers, since this paper is both the earliest chronologically and the lowest layer of abstraction. If you have never heard of it before this point and have the vaguest interest in file systems, then read on!
Read the rest of this entry »
Posted in Uncategorized | 2 Comments »
February 4th, 2009 chris
So now I’ve finally moved onto my own hosting and now that I’ve moved the blog over, I figured I’d finally take of it the way I should be. I always hate to say that I’ll be posting more often, since I always say it and so far I haven’t been great about it. However, a while ago I did read a good post on one of my favorite programming sites about how to fix up WordPress and make it “sing”. Check out how you can modify your MySQL conf file to work better with WordPress.
Read the rest of this entry »
Posted in Uncategorized | No Comments »
February 3rd, 2009 chris
For those of you who haven’t checked it out yet, highscalability.com reports on a lot of the fun wacky technologies involved in making websites you’ve all come to know and love and how they can scale to the ridiculous amount of traffic that comes with it. Pretty much all the articles keep track of rebuttals to their points all on the same page so it ends up being a fun way to dive into some new tech you’ve never seen before. They also do a good job of scouring the net and aggregating good distributed systems stuff together in one place. My favorites so far:
Read the rest of this entry »
Posted in Cloud Computing | No Comments »