Byzantine Reality

Avatar

Searching for Byzantine failures in the world around us

Articles tagged with 'java'

My First Run with the X10 Programming Language

Not so long ago I discussed my experiment to learn the X10 programming language as part of my “learn one programming language per year” project. Having constructed my first non-trivial application in X10 recently, I am now ready to make a preliminary opinion about its usability and the pros/cons. With that said, let’s take a look into one possible future for concurrent programming.

X10 follows the Partitioned Global Address Space (PGAS) model that languages such as Unified Parallel C(UPC) before it have taken. Thus like UPC, it has global arrays distributed over processors, but unlike UPC, the semantics of these arrays are quite a bit different. This is because computation is the first-class citizen of X10, while data is a second-class citizen. Don’t get me wrong, data operations are done very nicely in X10 (with great support for multi-dimensional arrays), but fundamentally this appears to be in order to make computation that uses this look nicer. With that, computation is really nice – if you want to run some code at a particular thread/process (known in X10 speak as a ‘Place’), just say

at (place) { do something }

and you’re good to go! But since data is a second-class citizen, making sure that the data you want is available at the right place or how to move it there is the big paradigm shift that X10 requires you to learn.

In my particular case, I was converting some MPI code that I had which was doing a distributed power method computation to X10. Like when I converted this code to UPC in order to learn it, a similar (but certainly not the same) inversion of thought is necessary in order for me to do this. And inevitably, another inversion of thought is needed to change that code into something that runs quickly. But that’s not a bad thing – it’s just the learning process associated with a new programming language.

A while ago I was stuck in a vi/emacs or GTFO mindset, and being in this mindset made X10 development very difficult. I had initially run into X10 at a developer day IBM put together back in March and thought that it was different from what I had seen before but not too different, such that it might actually be a language I could get real work done in. However, X10 falls into a similar level of verbosity to C#, which is just enough where vi is a bit counter-productive. So having got over that mentality since my initial encounter with X10, I embraced Eclipse (but just for X10 development so far) and was able to really get into it. Thus, throughout this project, I was inherently at a disadvantage of having to learn both Eclipse and X10 – the interaction of which is a big part of this review.

With that said, I was able to complete my Power Method code and run it over a cluster of machines (as one of the backends compiles to MPI).

What I liked:

  • In many cases, X10 is more succinct than Java, and the type inference for immutable variables (vals) does a good job. If you’re using mutable variables (vars), however, you still have to specify the type by hand, which is about the same amount of code as Java.
  • Back in X10 2.0.6, the tutorials were the best I have every seen – in Eclipse there was about two hours worth of tutorials explaining all the new concepts and what differs from Java. They made the language easy to pick up since there were many things that could be molded into the code I needed and oftentimes showed me what I was doing wrong.
  • The concurrency abstractions are pretty cool and nice and high-level – need to fire off a lot of threads and block until they’re done? Just use ‘async’, give it a block of code, and slap that in a ‘finish’ block! Most programming languages offer some version of this – Ruby’s is pretty good for example but in contrast is not thread-safe by default – but this is certainly the cleanest I’ve seen thus far.
  • X10 easily supports mixing its code with Java or C++ libraries / code – haven’t needed to use this myself, but a cool selling point. The syntax doesn’t look too bad for this, but since the point of this experiment was to get away from Java and C++, I don’t see anything immediately compelling me to use this. Similarly, they have alpha support for compiling to CUDA code for running on GPUs – again, I don’t have a GPU sitting around to try this on, but looks very interesting.
  • The Eclipse support (called X10DT) is done very nicely – there’s just a plugin that natively interfaces with Eclipse that has installed for me without any problems over the last two releases I’ve played around with.

What snagged me or could have been improved:

  • BadPlaceException – get used to seeing this fellow all the time once you start doing non-trivial programs. As the name suggests, a thread is trying to access data it doesn’t own, which causes it to violently explode. Debugging this is a pain since the line number in your X10 code doesn’t show up in the stack trace (the line number in whatever code it generates does show up, but that’s not helpful to me). Eclipse may have some debugging support integrated for this, but as I’m new to Eclipse, I have no idea what it is and the X10 documentation doesn’t appear to mention it. This was a double pain for me since once I was done getting rid of these on my local machine, I tried running it over a cluster of machines only to run into the same problem again – which without a line number for or debugging caused me to revert to the well documented “just put print line statements everywhere” strategy.
  • Remember how I said the tutorials were really good in X10 2.0.6? Well I upgraded to the current version (X10 2.1) and all that cool documentation vanished, leaving me with the language reference. Now I have to specifically know the classes I want info on and use this and the X10 Standard Library documentation – essentially JavaDocs. It’s not that bad for me since I already got my first grasps on X10 with the cool tutorials, but I think it hurts new users not to have it. Interestingly, a new updated tutorial on X10 2.1 was just put out today that has a lot of great info – check that outhere.
  • Syntax highlighting takes forever, and oftentimes error messages don’t go away until I save my code and let it do a pass over my code. Presumably future releases will fix this, but for now I just don’t trust the messages unless I save first and wait a few seconds for it to tell me what the real errors are.
  • Lack of auto-completion support – I figured that since I was using Eclipse I could do the usual hot-key and do method completion or it would show up whenever I hit the “.” on my keyboard, but alas, the current version doesn’t have it. Again, this is likely due to the newness of the project or relative unimportance compared to more pressing issues, but since the rest of the “real languages with IDE support” have this, I’d expect X10 to have this in the next release or two.
  • Putting my computer into sleep mode occasionally causes code to become not editable, which as you would suspect is a big pain in the ass. Interestingly, the code is always saved, so I just close Eclipse and re-launch it, but then I have to re-open the Eclipse X10 Help Menu to get back to the language reference (which takes 10-20 seconds to load), and get back to where I was before, and so on. Thankfully it doesn’t happen too often, and hopefully a future release will address this.
  • The Java backend sets the number of Places to 4 by default, and when I ran it over MPI with the C++ backend, it ran with a single Place. Since my code wasn’t tested on that, it naturally didn’t work (BadPlaceException) and I couldn’t figure out how to run it over the Java backend (which runs really fast) with a single Place. I struggled for quite a while looking at environment variables to set and after trying what I thought were all of them when compiling or running my code, I found my solution: just run it with MPI on the C++ backend with
    -n 1 -np <number of places>

    And that will do it! Debugging this still tends to be a pain since I’m now back to vi and 10-20 second compile times, but now I have some great code that looks a bit nicer than my MPI code. Since setting the number of Places is quite an important setting, I would also expect the documentation on how to set this to be improved by the next release or two.

Let’s wrap things up with an analogy: X10′s concurrency support is to MPI as Java’s garbage collection is to C++. MPI can give you super-fast code, but you’ll need to be a master coder to do so, and then it will likely only be optimized on whatever cluster your code was constructed on, just like how your C++ code will be memory-leak free only if you’re a pro at it – and in both cases, even the coding masters have had horribly slow code in MPI or leaky programs in C++. In the same way, Java and X10 offer much better programmer productivity since most of us aren’t memory management experts or gurus of concurrent programming (fun times with race conditions, shared program state, and so on).

X10 mixes things up just enough to be interesting, so it’s a project I will certainly be keeping an eye on and using in projects that permit it. If you have code you need to run concurrently or haven’t learned a new language recently, I definitely recommend checking it out!

The Perennial Java Problem

The hypothetical “good programmer” should learn a new programming language every year in order to keep up with the latest and greatest in the programming world and keep their brains nice and fresh. This is of course old news to anyone who has read The Pragmatic Programmer and similar works but it doesn’t hurt to bring it up every once in a while! When working on AppScale I got to learn Ruby and was infinitely happier for doing so, and while AppScale also uses Python I don’t feel as though it’s sufficiently different to justify calling that my language for this year. So with that said, I investigated what I will politely refer to as “the Perennial Java Problem”:

The Perennial Java Problem: Find a programming language that sucks less than Java but can still be used for real work.

If you are a programmer who has used Java, you likely already know what I’m talking about: Java is great but has some serious shortcomings. For starters, using Java without Eclipse or NetBeans is a death wish. This is already lethal for a person like me, who had fallen into the “vi/emacs or GTFO” mentality. Disagree with me? Try programming the sample apps for the Android without Eclipse. I tried it, succeeded, and learned a lot for my trouble, but it’s truly a second-rate experience compared to using Eclipse to do the same job. So why do you need Eclipse for this? Because (insert drum roll sound here) Java is ridiculously verbose. I know, you already knew that. But the sheer complexity of the jobs Java now supports means that you can BE A BIG MAN and use vim and ant and the COMMAND LINE to take an hour to run the first iteration of a note pad application on the Android or you can spend five minutes doing it in Eclipse. Sheer pragmatism demands that you break out of the “vi/emacs or GTFO” mindset if you want to program in certain languages.

As far as I’m concerned, Ruby solves the Perennial Java Problem for me well enough – it is simple enough that I can actually use vim to program in it and not regret the decision until the day I die. It can do the things I actually need to do to get work done with extreme ease – read and write files in less than a dozen lines of code, write a web application that I can actually use in an hour, and the inevitable “throwaway scripts” that Ruby excels at that are just enough of a pain in the ass that I can’t use Python or Java for. Granted, I still give the fantastically worthless award of “best web framework” to Python Google App Engine’s webapp, since it truly embodies the spirit of you have work to do so I’m doing to make this as simple as possible for you. However, reading/writing files and shell scripting is still a pain in Python compared to Ruby (To the Pythonistas out there: it’s not that bad – it’s just not as easy as Ruby).

So why don’t I just go get married to the Ruby programming language then, since I’m obviously so in love with it? Well, although I obviously love Ruby, there still is the whole “learn a programming language a year or your brain will rot and you will do is think Ruby until the day you die” issue. This nicely brings us back to the (modified) Perennial Java Program:

The Perennial Java Problem: Find a programming language that you don’t already know that sucks less than Java but can still be used for real work.

Historically speaking, the latest and greatest programming techniques are inevitably stolen from functional programming languages (e.g., garbage collection, dynamic typing), so why not just cut out the middleman and learn a functional programming language? Well it’s because of that last bit of the problem: the language needs to be one that I can do real work in. And the experience for me with a functional programming language is always very similar to the following:

  1. Invention: The Gods of Functional Programming have bestowed upon us mere mortals a new functional programming language, X.
  2. Sales Pitch: This language can compute a lazy infinite set of integers IN ONE LINE IN PARALLEL OMG
  3. Sold: I immediately decide that this language is amazing and all others are immediately inferior
  4. Disappointment: I can’t figure out how to read or write a file in less than a dozen lines without breaking all the fundamental concepts the language holds dear to its heart.
  5. Depression: I return to the previous programming language I was using and am unable to bring back anything profound from the functional programming world to it.

Part of the reason that functional programming languages haven’t clicked with me is points (3) and (4) – I’ve decided to use this language for everything and maybe I shouldn’t. This is partially my fault: it’s easy to say X is great for this so let’s use it for everything. For Ruby this works fine most of the time but for Java not so much (specifically with respect to shell scripting). When approaching functional programming languages I (perhaps mistakenly) approach them with the mindset of how is this an improvement over what I’m using now? And since they (perhaps naturally) fall short on the things that I need to do for my work, maybe I’m setting myself up for failure with them.

Let’s then look at a (relatively) new programming language and examine exactly what problems it solves:X10. It’s one of the many “runs on JVM” languages but has the following features that I find particularly interesting:

  • First-class functions and anonymous functions – functions can be passed around to other functions, stored in variables, or not if you don’t want to.
  • Domain specific: It aims to solve the problem of writing concurrent applications – so it doesn’t aim to replace Ruby/Python for your shell scripting and web app needs, but bills itself as the language to use when you have some serious computation to do.
  • Compiles to Java bytecode for serial execution mode or (more interestingly) to C to be deployed over MPI
  • If compiling to Java bytecode, can interact with Java code
  • Looks very much like what UPC was for C – adding simple but powerful PGAS abstractions

The second point is the crucial one from what we’ve been talking about here: it aims to do concurrent / distributed programming very well and not so much of anything else. This is fine by me, as I can still stick by my favorite Ruby for all other tasks and use X10 for the tougher work. Furthermore, its Java interoperability is a nice touch should I really happen to need something from Java-land (although I can’t think off the top of my head what I would need in this programming domain).

Will X10 “suck less than Java for doing real work”? Time will tell. But initial hopes are high – the Eclipse integration is very nicely done and the documentation in there is extremely helpful. And while X10 looks like it will be a “need to use an IDE” language, that’s fine by me as long as it gets the job done.

MapReduce on Scientific Apps

This quarter Brian Drawert, Matthew Norman, and I have been working on seeing how viable the MapReduce programming paradigm is for scientific computing applications. We’ve been porting over many common scientific algorithms to run over MapReduce and see how well they work. We’ve implemented a subset of the NAS Parallel Benchmarks and have found a number of interesting results (but for many of you these results will be fairly intuitive).

Original resources: Class webpageslidespaper

The specifics are all outlined in great detail in the paper linked above, so I won’t reiterate on quite the level of detail presented there. This mostly serves as a summary to whet your appetite and get you interested in the paper or save you the time of reading it if you find this uninteresting.

That being said, let’s summarize! MapReduce is aimed at solving embarassingly parallel problems, so the two benchmarks I implemented (which were embarassingly parallel) were easy to implement and ran pretty quickly. All of our results ran over four virtual machines, so we will definitely be adding in more boxes over the summer and seeing how much the numbers improve (that is, if we get close to a linear speedup or why not). Both of my algorithms used a pseudo-random number generator that could generate numbers in the series independently of each other, so the work was incredibly easy to farm out to Map and Reduce tasks.

One important takeaway from this project was to quantify how bad MapReduce was at tasks that don’t provide enough computation to justify the communication costs or are iterative. Algorithms that are both suffered extremely poor performance and need to be substantially reworked before they become viable options for the community. Our Conjugate Gradient algorithm runs in seconds on MPI but simply takes hours in MapReduce (three hours on a 200×200 matrix to be precise), but this is not necessarily MapReduce’s fault. When implementing Conjugate Gradient (not I, of course), it requires many iterations before it converges on a solution, and since each round was implemented through inefficient MapReduce jobs, this compounded the poor running time and made it unusable.

This is a common theme that we see in MapReduce versus other parallel or distributed programming environments: it’s generally much easier to write the code since it’s serial, but now there’s a lot more complexity at the program level. This is a common adjustment users make when learning to program in MapReduce; now users must be concerned with how Map tasks and Reduce tasks interact instead of how to schedule computation, and it’s not something where one is provably better (it’s just different).

That being said, I do believe there is hope for MapReduce on scientific applications. We’ve had great successes with Hadoop Streaming and programming many standard matrix operations in many different programming languages (Ruby, Python, Perl, Java) and would love to open-source it all for the community once it’s in a much more usable state (performance-wise, that is). As always, stay tuned and I’ll keep you up-to-date on it!

P.S. Arbitrary precision in Java sucks and is great in Ruby. Why can’t I do logarithms on Java’sBigDecimals? I tried implementing my own log function to get around this using a super-naive bisection-method-like algorithm, but utterly failed when I found out that the power function can only compute integer powers. There are other complaints I can make, but I will save them for another time.

Black Friday Sale = More Books!

So as you’ve noticed I’ve fallen off the blog posting bandwagon. Much apologies! But with Black Friday behind us I’ve picked up two new books at awesome prices:

  • Programming Ruby 1.9: I already have the older version covering 1.8 and although a decent chunk of this is a re-hash, it’s nice to look over it again and see a much more comprehensive treatment of old material. It adds in a much greater reference on the Ruby standard library and a light discussion on Ruby metaprogramming and security features, which I’m looking forward to reading and writing about later.
  • Stripes: I’ve had my eye on this book since it came out but never really got around to getting it. The subtitle of the book is really what sold me on it: “and Java web development is fun again”. Wow. That’s pretty much all I needed after a year with Spring and Spring Web Flow. Don’t get me wrong, they were great at what they did, but they were a bit of a drag. I’ve already gone through the first chapter and done the “Hello World” example and am very pleased with what I’m seeing. Stay tuned for a review of this as well.

Finals are out next week, so it may be a little light on the posting until then. I’ll try to keep up though, but we shall see. Until then!

"Programming Clojure" Looks Intriguing

I’m an avid fan of the Pragmatic Programmer series of books, which is incredibly obvious to anyone who regularly reads this blog (which at this point is really just me). So they happen to have a ton of books that involve getting done what you need to get done in the least amount of time while keeping good maintainability and such. This has been the main motivation for me to get a lot of books they put out. But I stumbled across a new book entering beta soon that looks positively intriguing: Programming Clojure.

Clojure is a language I ran into a while ago while looking at languages that run on the JVM. It essentially is Lisp on the JVM. And this is a sufficiently interesting combination of languages when you think about it. Contrast this with Scala, which really feels more like a natural advancement to Java. It’s lighter on the syntax and infers types pretty well most of the time and is way less verbose (my main complaint with Java).

Clojure is a totally different beast. Clojure is a dynamically typed language (versus Java/Scala’s static typed) and is based on a language that has a weird long history. Lisp pioneered many language features that were really way ahead of its time (dynamic types, garbage collection, and many others) and are really only gaining traction now.

But I’m hoping the Clojure book will be different from what I’ve seen before. I’ve read SICP and just didn’t really get into Scheme. For a week I was all about it, but while the book was great, it really way just too stuck in abstraction-land. Don’t get me wrong. I LOVE ABSTRACTION. It’s the core of computer science, and if you can’t abstract then you can’t do computer science. Period. But I never found out how to do useful things in Scheme. I never found out how to read from a file and write to it. I never found out real-world examples of how Scheme would make my life easier. I thought it was really cool how Scheme made you think and program but just couldn’t get work done with it. Contrast this with Ruby, where I like how it makes me think and I get an anomalously high amount of work done in it.

So I have high hopes for Clojure. It seems to combine the cool abstraction powers of Lisp with the “actually gets work done” of Java. And yes, I am obviously aware that people have been getting tons of work done in Lisp for the billion years it’s been out. It’s just that the reference book of the gods on it didn’t seem to really touch on it that much and for some reason it never really clicked to me to go look it up (obviously my bad). The page on the book also claims to talk about software transactional memory, which sounds pretty interesting, and claims that Clojure is as fast as Java, which is also pretty cool.

Rest assured that when the book goes into beta I’ll be one of the first to pick it up and talk about it, and hopefully it lives up to expectations.

Simplicity, Round 2

I originally intended these random ‘Simplicity’ updates to be about things I like in Ruby that are a pain to do in Java. And while that’s certainly the case this time around, this comes from a library perspective rather than the languge itself. Specifically, I’m talking about processing command-line arguments. It’s something you have to do all the time when you write these little scripts that come up but for some reason results in the same boilerplate code being constructed. This is why I’m exceptionally glad that Ruby has many solutions to this problem.

So, just to clarify, this is the problem. The user types in this:

./myprogram --file data.txt --optional-thing true --whatever 100

And I want to be able to make sure they put in certain flags, and that they meet some criteria, and get the data I need out of it. Coming from Java-land, I wasn’t taught anything in particular to do this, and Google really only reveals a useful how-to to write your own. Their solution seems to be the equivalent of writing an interpreter for your command line arguments, and is a bit over-the-top for something this simple.

Ruby, strangely enough, has at least half a dozen different competing libraries that specifically serve the purpose of parsing command-line arguments (two of which are built into the language). The one I chose was the first one that showed up on Google, OptiFlag. It bills itself as a Domain Specific Language for command-line arguments, which still sounds weird to me. However, their implementation is pretty solid. The way the JavaWorld article suggests seems pretty bulky, but with these guys, you download their package, include it in your program, and once you learn how to ‘speak’ their language, you can rip out everything you need in a line or two of code.

The documentation on OptiFlag is amazing too! The simple tutorial and clear pictures make it really easy to pick and grab whatever you need to do what you have to do, and the fact that arguments can be validated via regex is amazing. Not that anything is groundbreaking there, but the fact that you can saves more boilerplate code in the long run.

If you do a lot of little scripts that use the command line, check out OptiFlag, and if you use Ruby and haven’t seen OptiFlag, go check it out. It’s definitely a different take on command-line argument processing.

First There Was Java...

For the lion’s share of my undergraduate education (80%-90%) we got to program in Java. We did half a semester in C# (close enough), a semester in C, a semester in assembly, and everything else was Java. Although I picked up other languages before I got to undergrad life, Java was the first language that I really learned well, and a lot of how Java does things seeped into my head about how to do things in every other language. It’s not a bad thing, but it made my head spin when I ended up seeing Ruby for sure.

This article has been in the works in my head ever since I read The Perils of JavaSchools, since I went to one. Early on in the article Joel pretty much captures the problem with only teaching Java:

Java is not, generally, a hard enough programming language that it can be used to discriminate between great programmers and mediocre programmers.

This is right on. And that’s not to say Java shouldn’t be taught in schools. But I don’t know if it should be the first language to teach students (I’ll get back to you on that) and it certainly shouldn’t be the only language that students end up really knowing.

We can debate what it means to “really know” a language, but as far as I’m concerned, it’s knowing the language’s standard library. You can easily “program” in a language like Perl even if you don’t know Perl but know another imperative language (e.g., C or Java), but you don’t really know Perl unless you know all those special metacharacters and Perl’s regex syntax and blah blah blah.

I think what I’m trying to get at is that I really wish I got to learn two languages from different paradigms. Something like Java and Lisp, or C and ML, but to a degree where if a problem comes up, I can take the extra five minutes to think about which language solves the problem best, rather than having to do it in Java because I’m just not productive enough in anything else. Part of this problem is relieved by me trying to learn Ruby well and insisting that I do everything in Ruby until I get a good grip on it (or unless Java is such a blatantly better choice).

I think the problem with having Java as the only programming language is that, until I really got a good look at Ruby, I looked at the basic way we were taught to get input from the user and thought it just had to be that complicated:

BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String response = in.readLine();

As I’ve said, Java is not my only language. I know it’s a lot less work to do this in C or Perl. But since I don’t really have the same grip on C or Perl that I have on Java, it’s not something that really ever hit me. Yet now that I’m looking to know as much about Ruby as I do about Java (hopefully more), all these little things are raising red flags. Like in Ruby, to get a line of input from the user, it’s:

foo = gets()

Which is closer to how it’s done in Perl or C and much shorter than Java. And of course I could go on about how Ruby is way less verbose than Java (which I will do next time), but Ruby’s not the focus of today. Java is. Java has been a great language for me over the last how-many-years. But I think we can come to a consensus that, no matter how you feel about Java, it can’t be the only tool in your toolbox.

I also think that the only thing worse than not knowing a particular language is knowing it not that well. I spent about a year maintaining O’Caml applications, and I still can’t say that I know O’Caml. I can subconsciously look at O’Caml code and say what it’s doing, but I don’t know the O’Caml standard library. Google and the O’Caml docs are my best friend when I touch O’Caml, which just isn’t something that happens when I’m using Java. Sure, I still use Google, but I don’t need to go on message boards and forums to find out what a particularly weird error I’m having means. But this is getting off-topic. What I mean to say is that I don’t really know all the amazing features behind O’Caml. I know some subset of it that corresponds to the style of the coders whose code I maintained, and that’s about it.

I’ve decided for Ruby that that’s totally unacceptable. It shows a lot of promise and I when I talk about Ruby, I want to at least appear like I know what I’m talking about. I want to approach a Ruby conversation with the same gusto that I have when I’m talking Java. And I wish that I had that knowledge with another language from my undergrad days. But there’s no time like the present, so off we go on Ruby!

Eucalyptus

A huge amount of buzz in the internets and especially cloud-computing land is about Amazon EC2. With EC2 you can go pay Amazon some money and get a nice little virtual computer with its own IP and all that fun stuff and throw up your web site on it. Other cloud computing vendors offer software that runs on it to make sure the apps you put on EC2 stay up no matter what (e.g., put a web site on it and make sure that no matter how much traffic it gets, it’s still able to stay functional).

But what if you wanted an open-source alternative? Enter Eucalyptus.

Disclaimer: Since Eucalyptus is a UCSB product and I’m at UCSB, I’m not entirely unbiased. But presumably you realize I’m biased and to some extent, you are too.

Eucalyptus for people who have a cluster and want to run it in the same fashion that Amazon EC2 is done. To do so, it’s API-compliant, meaning that the tools you use to talk to EC2 work exactly the same on Eucalyptus.

Eucalyptus puts a pretty front end around your cluster and virtualization tools in (presumably) a similar fashion as Amazon EC2. Virtualization is done via Xen but since they use the libvirt library, which claims to be virtualization-agnostic, you’ll be able to use other tools down the line.

You can see all this info and more in a presentation Rich Wolski gave at Velocity a few months ago, but my initial observations come from a different angle. I had wanted to try out EC2 for quite some time but wasn’t sure how much I’d have to pay to try it out. With Eucalyptus, if you happen to have a few (relatively recent) boxes lying around you can be up and going in no time. For free.

And the whole thing has been a giant learning experience for me. Learning about Xen, making images, all that stuff has been an awesome time. I’ve got a few images I’m going to upload to Eucalyptus soon and fiddle with, and it’s something I definitely recommend doing if you’ve got some spare time on your hands and a few boxes.

If you don’t, and you have some cash lying around (how much is uncertain to me), give EC2 a try. Let me know how much it takes you to get up and going and how it is.

Simplicity, Part 1

Just a random little mumbling for today: I was on the StackOverflow message forum and saw a questionasking how to find your gateway’s IP in Java. The easy way is to do it via a shell command, but in Java this got brutally complicated, and while I was doing this, all I could think was “geez, how much simpler is this in Ruby?”

Here’s why the Ruby guys are all so happy (and why I’m frustrated at Java). Here’s the Java code to do it:

import java.io.*;
import java.util.*;

public class ExecTest {
   
public static void main(String[] args) throws IOException {
       
Process result = Runtime.getRuntime().exec("traceroute -m 1 www.amazon.com");

       
BufferedReader output = new BufferedReader(new InputStreamReader(result.getInputStream()));
       
String thisLine = output.readLine();
       
StringTokenizer st = new StringTokenizer(thisLine);
        st
.nextToken();
       
String gateway = st.nextToken();
       
System.out.printf("The gateway is %s\n", gateway);
   
}
}

And here’s the Ruby code:

result = `traceroute -m 1 www.amazon.com`
result
.each do |line|
 
if line =~ /\d  (\d*.\d*.\d*.\d*)/
    puts
"The gateway is #$1"
 
end
end

Holy crap that’s a huge distance. And for those of you more into the numbers, here’s what word count has to say:

     lines   words chars
       
6      21     141 ExecRuby.rb
     
14      50     502 ExecTest.java

This is a huge difference, but since numbers can be misleading, I’d rather measure the time it took me to wrote these: it took about 5 minutes to write the Java version, and half that to write the Ruby version. I’ve known Java for a while but since I’m a little rusty, I got hung up a bit on the little syntax stuff and dumb library crap. However, I don’t really know Ruby that well and am still learning it, and I still got it done in half the time!

A lot of this owes to Ruby being much simpler on the syntax. Many argue it’s the dynamic / duck typing, but C# 3.0 is evidence against that with their new local type inference feature. It’s just easier to get it done in Ruby (at least in this example), and I’m suspecting this so much that the title of this post is numbered for possible future cases of this.

C, SWIG, and Java

As promised, today we’re going to look at SWIG. The basic idea of SWIG can be summarized as follows:

Got code in C or C++ that you want to use with your favorite language? Well then look no further! SWIG is the thing for you!

This is essentially the Wrapper Facade from POSA 2 in three sentences instead of many pages (although much less convincing than POSA 2 did it). We’ve been rambling about how they could only use C++ to make C code object-oriented, but that it’s not entirely their fault since they wrote the book in the year 2000. Last time we mentioned the ctypes library as an easy way for Python programmers to use C code, but the downside was you could only do this in Python.

SWIG aims to do what ctypes does for Python but for many other languages. It boasts that it can take C code and make it accessible to C#, Common Lisp, Ruby, and many others. 

As I've always wanted to wrap my C code in Java code, we’ll be looking at using SWIG to “automatically” generate Java Native Interface (JNI) code that we can then use to solve the same problem from last time. This is particularly appealing to me since I have no experience with writing JNI code and have heard that it is particularly unpleasant.

That being said, a quick recap is in order, especially if you missed our last post. We want to take some Ccode that sends and receives data blocks to and from a server and wrap it in some Java code to make it thread-safe. We are aware of the fact that C can trivially do this (e.g., pthreads) but want to use Java to illustrate that in case we want to add object-oriented functionality or other wonderful things that the POSA 2 book describes in better detail. So to reiterate, here is our C code (like last time, note that the spaces in < stdio.h > need to be removed and were only there because of weird formatting issues):


#include < stdio.h >


int* get_data(int block_number) {
   
// do some input validation here
    printf
("Retrieving block %d\n", block_number);
   
int* dummy_data;
   
return dummy_data;
}

void send_data(int block_number, int* data) {
   
// do some input validation here
    printf
("Committing block %d\n", block_number);
}

So once you’ve installed the SWIG package, you need to make a template version of your C code (analogous to a .h file for your .c file). Having looked over the tutorial, ours looks like this:

/* ccode.i */
%module ccode
%{
extern int* get_data(int block_number);
extern void send_data(int block_number, int* data);
%}

extern int* get_data(int block_number);
extern void send_data(int block_number, int* data);

Note that we follow their tutorial’s example and put a copy of the method signatures outside the %s (although it’s not particularly obvious why this is needed). Now we can just run each of these commands to generate the “glue” code that our Java code will use:


swig
-java ccode.i
gcc
-c ccode.c ccode_wrap.c -I(location of jni.h)
ld
-bundle -flat_namespace -undefined suppress -o libccode.dylib ccode.o ccode_wrap.o

The idea is that although this process is mucky (and presumably it is for non-trivially sized C code), we now can generate glue code for (almost) any language. Some things to make a note of here before we go much further. So the first line takes your interface file (here, ccode.i) and produces the wrapper (ccode_wrap.c). Last time I named my ccode.c file as c-code.c, but SWIG doesn’t care very much for the dash and the error message doesn’t help much. The more interesting thing to note is that we have to generate the .i file ourselves. It seems that since it just has the method signatures that a C parser or regex program could easily accomplish this, but alas none is available yet.

So on the second line, note that we have to include some header files based on our output language (here Java), so you’ll need to do a “locate jni.h” to find the right directory to include. Most importantly is the last line. Our dynamic library must be named lib(original name).dylib if you’re on a Mac (or presumably .so.1 if you’re on Linux). This is completely NOT documented and had to find it on a random message board, so I’ll say it again where the formatting ensures you will see it:


The dynamic library for ccode.c MUST be named libccode.dylib on Mac OS X (try out .so.1 on Linux)

This is a lot of bullshit to go through if you didn’t have this guide to help you out (since the SWIG tutorial is ‘meh’ at best). Presuming you got through all that nonsense alright, you’re in the clear. SWIG generates tons of fun glue for you and you can end the day with this very simple Java code:

class RealWrapper {
   
private static boolean libLoaded = false;

   
public RealWrapper() {
       
if (libLoaded == false) {
           
System.loadLibrary("ccode");
            libLoaded
= true;
           
System.out.printf("C library loaded successfully.\n");
       
}
   
}

   
public synchronized SWIGTYPE_p_int getData(int blockNumber) {
       
return ccode.get_data(blockNumber);
   
}

   
public synchronized void sendData(int blockNumber, SWIGTYPE_p_int data) {
        ccode
.send_data(blockNumber, data);
   
}
}

Java solves our problem easier than Python did (no need to break encapsulation here since we’ve got static variables) although there is some ugliness from having to see the weird SWIG object types, but presumably there is some function that can convert these to familiar versions of the same object. Now, let’s wrap this up with our main method:

public class main {
   
public static void main(String argv[]) {
     
RealWrapper rw = new RealWrapper();
     
SWIGTYPE_p_int data = rw.getData(10);
     rw
.sendData(10, data);
   
}
 
}

So now we have an alternative to Python’s ctypes library that we can use for many of the “mainstream” programming languages. Since most languages also provide some way to talk to C code, these methods combined with SWIG provide a way to make a Wrapper Facade ala POSA 2 to give your C code new life as objects and all the fun stuff that goes along with that.

We could do many more posts about other technologies that are alternatives to this (for example, JNA is promising for Java, JRuby, and Jython), but I feel like I’ve beat the point into the ground that the POSA 2 guys now have reasonable alternatives to using only C++ to wrap C code in. Next time though I’ll give a solution to another gripe I had about one of their other patterns.

This brings up another good point. Why even care about this? Why gripe about a book that seems to be riddled with problems? It’s because the book has so much potential! The patterns for concurrency and networked objects are great! But it’s completely unreasonable to assume that we will only use C++ to do this. It may have been fine eight years ago, but not now. There are new, better ways to do this, so I see this as a way to give back. This is more of a “it’s a great book but this is what it really needs to be great and live up to its promises” kind of thing. (And on a purely selfish note, it helps me become better familiar with these technologies.)

profile for Chris Bunch at Stack Overflow, Q&A for professional and enthusiast programmers