Nowadays it seems like every sequel gets a bad treatment. They totally butcher the story and blah blah blah you’ve heard this a million times already. But there is no doubt that God of War 3 will keep true to it’s roots:
Today’s read is something quite outside the norm of what I normally talk about. It came to me through one of my favorite podcasts, Common Sense with Dan Carlin. Dan caters to all sides of the political spectrum (and by his accounts he’s infuriated all of them repeatedly over the years), and on a recent show he had on Vincent Bugliosi. Most of his work is before my time, but he is well known for prosecuting the Charles Manson case and has a nearly flawless legal record (the book cites him as winning 105/106 felony jury trials). On this day he was on for legal reasons, but not about somebody you normally hear about being prosecuted for murder:
This quarter Brian Drawert, Matthew Norman, and I have been working on seeing how viable the MapReduce programming paradigm is for scientific computing applications. We’ve been porting over many common scientific algorithms to run over MapReduce and see how well they work. We’ve implemented a subset of the NAS Parallel Benchmarks and have found a number of interesting results (but for many of you these results will be fairly intuitive).
Just in case you were interested, I gave a talk last week at UCSB covering everything having to do with cloud computing. Unfortunately, I was only able to tape the first ten minutes of it, so I pretty much just cover virtualization and some basic introduction stuff as well. Here’s the slides if you want to follow along at home or see what I talked about after the video ended. Enjoy!
Now that we’ve covered the most popular (thus far) part of MapReduce’s life, let’s move on to its present and uncertain future. This time around we’ll cover extremely new material starting this year and will try to avoid rampant speculation about the future since there tends to be a high probability of it being flat out wrong. This article is primarily concerned with two topics: a new paper out that will be discussed at this year’s SIGMOD conference (A Comparison of Approaches to Large-Scale Data Analysis), and Hadoop Streaming, a fun new way to play with MapReduce that’s being used in some serious new cloud projects.
…if you’re a sysadmin, of course. pconsole is a pretty cool utility that lets you enter in your *nix commands from one machine and have it run wherever you need to. It connects to terminal windows that are already open and connected to the machines you want to run it on, so the output is broken up every nicely. So if there’s a problem running a command on one machine, it’s really easy to know which machine it is. And it works on Mac OS X out of the box (almost)! All you have to do is run “chmod +x /usr/local/bin/pconsole” once you’ve installed it. I think the obligatory picture from the pconsole site pretty much sums up how cool it is (although it’s hard to really know it until you’ve racked your brains copying and pasting to many many machines)
Thanks pconsole and thanks to Server Fault! It’s the new sister site to programming-QA site StackOverflow and looks pretty nice so far. Since I’m more in programmer-land than sysadmin-land I don’t get quite as much use of Server Fault as StackOverflow, but it did lead me to this gem called pconsole!
Recently for one of my classes I got the chance to research a parallel programming technique and it’s applications. Being a bona-fide MapReduce whore-in-training, I naturally had an easy choice. Since this is “Part 2″, we cover the era I’ll dub the “Return of MapReduce”, in which Google re-introduces and rebrands MapReduce as the technique we love to hate and love to love (choose whichever suits you best) and new advancements are made with it. This covers about the 2004-2008 time period, in contrast to Part 1 (the beginning of existence-2004), where map-reduce exists as part of Lisp, MPI, and other environments but is not particularly accessible to the average programmer, and Part 3 (2009-present), which I will leave as a surprise for those of you diligently following along at home.
With that said, here’s part 2, beginning with the paper that changed MapReduce and revived mainstream interest in the technique. Enjoy!
One of the most viewed posts we’ve written about here is our post on MapReduce from a little while ago, so when picking out the next paper to look over, I thought something related to it would be optimal. With that in mind, Yahoo! Research has a relatively new paper that they published in SIGMOD ‘08, titled Pig Latin: A Not-So-Foreign Language for Data Processing. Pig Latin bills itself as a natural progression of MapReduce in several ways, and indeed looks pretty interesting.
Over the last few months we’ve been working away on something that we think is pretty cool, and just two weeks ago we finally released the first version of it (which was naturally followed up by another release to fix the bugs in the first). It’s something we call AppScale, a platform on which you can run Google App Engine apps. But how does it differ from the platform that Google gives you to run App Engine apps on your local computer or the platform they host it in? Let’s explore that together!
Now that we’ve spent more than enough time looking over Google’s highly scalable infrastructure, let’s turn our attention to an even newer paper from Amazon. Their datastore, dubbed Dynamo, is an interesting contrast to Google’s work that brings up many interesting questions and points to note from it. Here’s the executive summary:
Dynamo is built to be highly available, and sacrifices the traditional notion of consistency in order to do so.
Yet the paper itself perhaps gives a better one-liner:
Dynamo can be characterized as a zero-hop DHT, where each node maintains enough routing information locally to route a request to the appropriate node directly.