This quarter Brian Drawert, Matthew Norman, and I have been working on seeing how viable the MapReduce programming paradigm is for scientific computing applications. We’ve been porting over many common scientific algorithms to run over MapReduce and see how well they work. We’ve implemented a subset of the NAS Parallel Benchmarks and have found a number of interesting results (but for many of you these results will be fairly intuitive).
The specifics are all outlined in great detail in the paper linked above, so I won’t reiterate on quite the level of detail presented there. This mostly serves as a summary to whet your appetite and get you interested in the paper or save you the time of reading it if you find this uninteresting.
That being said, let’s summarize! MapReduce is aimed at solving embarassingly parallel problems, so the two benchmarks I implemented (which were embarassingly parallel) were easy to implement and ran pretty quickly. All of our results ran over four virtual machines, so we will definitely be adding in more boxes over the summer and seeing how much the numbers improve (that is, if we get close to a linear speedup or why not). Both of my algorithms used a pseudo-random number generator that could generate numbers in the series independently of each other, so the work was incredibly easy to farm out to Map and Reduce tasks.
One important takeaway from this project was to quantify how bad MapReduce was at tasks that don’t provide enough computation to justify the communication costs or are iterative. Algorithms that are both suffered extremely poor performance and need to be substantially reworked before they become viable options for the community. Our Conjugate Gradient algorithm runs in seconds on MPI but simply takes hours in MapReduce (three hours on a 200×200 matrix to be precise), but this is not necessarily MapReduce’s fault. When implementing Conjugate Gradient (not I, of course), it requires many iterations before it converges on a solution, and since each round was implemented through inefficient MapReduce jobs, this compounded the poor running time and made it unusable.
This is a common theme that we see in MapReduce versus other parallel or distributed programming environments: it’s generally much easier to write the code since it’s serial, but now there’s a lot more complexity at the program level. This is a common adjustment users make when learning to program in MapReduce; now users must be concerned with how Map tasks and Reduce tasks interact instead of how to schedule computation, and it’s not something where one is provably better (it’s just different).
That being said, I do believe there is hope for MapReduce on scientific applications. We’ve had great successes with Hadoop Streaming and programming many standard matrix operations in many different programming languages (Ruby, Python, Perl, Java) and would love to open-source it all for the community once it’s in a much more usable state (performance-wise, that is). As always, stay tuned and I’ll keep you up-to-date on it!
P.S. Arbitrary precision in Java sucks and is great in Ruby. Why can’t I do logarithms on Java’sBigDecimals? I tried implementing my own log function to get around this using a super-naive bisection-method-like algorithm, but utterly failed when I found out that the power function can only compute integer powers. There are other complaints I can make, but I will save them for another time.