Byzantine Reality

Avatar

Searching for Byzantine failures in the world around us

Articles tagged with 'linux'

pconsole is awesome and you should use it...

…if you’re a sysadmin, of course. pconsole is a pretty cool utility that lets you enter in your *nix commands from one machine and have it run wherever you need to. It connects to terminal windows that are already open and connected to the machines you want to run it on, so the output is broken up every nicely. So if there’s a problem running a command on one machine, it’s really easy to know which machine it is. And it works on Mac OS X out of the box (almost)! All you have to do is run “chmod +x /usr/local/bin/pconsole” once you’ve installed it. I think the obligatory picture from the pconsole site pretty much sums up how cool it is (although it’s hard to really know it until you’ve racked your brains copying and pasting to many many machines)

screenshot2

Thanks pconsole and thanks to Server Fault! It’s the new sister site to programming-QA site StackOverflow and looks pretty nice so far. Since I’m more in programmer-land than sysadmin-land I don’t get quite as much use of Server Fault as StackOverflow, but it did lead me to this gem called pconsole!

You Should Use 'Screen'

Why have I not heard about screen until now? It’s an awesome little app that lets you make terminal windows on the fly, switch between them, and most useful of all, detach windows and resume them later. There have been SO many times when I have an app that prints to standard out that I need to have run in the background but doesn’t on its own.

Even if you have a nice little app that makes terminal windows for you (I have iTerm and love it), the detaching ability is well worth the small learning curve on screen. It comes standard on Mac OS X and is a quick install on Ubuntu and friends. This walkthrough does a great job teaching all the screen commands and how to use them, so if you end up using screen religiously, you’ll either be there a lot or end up memorizing all the commands.

Xen on the Xeon

Over the last three weeks I’ve had nothing but trouble trying to get a little cluster set up in our lab and even more trouble trying to get Xen to work. The biggest frustration is that these technologies are not bleeding-edge, untested, actually beta pieces-of-garbage: they’re technologies that are heavily invested in! Thankfully, we finally resolved some of these problems, so for those of you with Xeon CPUs withHyperThreading, here’s how we did it:

Step 1: Go into the BIOS and turn off HyperThreading.

Step 2: There is no Step 2.

That’s it. That’s all it took to get rid of the evil kernel panic plaguing me after I installed Xen on our Xeon box. Specifically, after installing Xen, trying to create a virtual machine causes this to happen:

[ 300.375060]
[ 300.375176] Pid: 14620, comm: gzip Not tainted (2.6.24-19-xen #1)
[ 300.375298] EIP: 0061:[] EFLAGS: 00010a13 CPU: 1
[ 300.375420] EIP is at 0xc1bb5429
[ 300.375539] EAX: c1bb9a60 EBX: c1bb3460 ECX: 00000000 EDX: 00000000
[ 300.375660] ESI: 00000001 EDI: 40040000 EBP: 00000000 ESP: e9eadd00
[ 300.375783] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[ 300.375903] Process gzip (pid: 14620, ti=e9eac000 task=ec529830 task.ti=e9eac000)
[ 300.376026] Stack: c01623a5 00000000 00000000 e9eadd44 00000001 00000001 e9eadd3c c0162456
[ 300.377094]    c1bb9a60 c1522174 00000001 00000001 c0165997 00000001 c0176681 00000001
[ 300.378146]    00000000 c1bb9a60 15852ff8 f578e000 00000000 c0000000 c0169d61 c1bb9a60
[ 300.379199] Call Trace:
[ 300.379427] [] free_hot_cold_page+0x195/0x220
[ 300.379661] [] __pagevec_free+0x26/0x30
[ 300.379890] [] release_pages+0x137/0x160
[ 300.380116] [] move_page_tables+0x611/0x800
[ 300.380346] [] dec_zone_page_state+0x21/0x70
[ 300.380574] [] free_pgd_range+0x26c/0x370
[ 300.380804] [] free_pages_and_swap_cache+0x74/0xa0
[ 300.381043] [] setup_arg_pages+0x284/0x290
[ 300.381275] [] load_elf_binary+0x3d9/0x1c90
[ 300.381508] [] file_read_actor+0x0/0x100
[ 300.381738] [] current_fs_time+0x13/0x20
[ 300.381971] [] follow_page+0x20d/0x410
[ 300.382205] [] get_user_pages+0x163/0x540
[ 300.382437] [] get_arg_page+0x4b/0xb0
[ 300.382667] [] load_elf_binary+0x0/0x1c90
[ 300.382893] [] search_binary_handler+0x9a/0x1e0
[ 300.383123] [] do_execve+0x1a6/0x1d0
[ 300.383350] [] sys_execve+0x2f/0x80
[ 300.383577] [] syscall_call+0x7/0xb
[ 300.383805] [] vcc_getsockopt+0x110/0x170
[ 300.384036] =======================
[ 300.384154] Code: 20 00 00 00 00 40 01 00 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 01 10 00 00 02 20 00 00 20 00 40 01 00 00 00 ff ff ff 14 70 bb c1 00 00 00 00 20 cb bb c1 00 01 10 00 00 02
[ 300.390976] EIP: [] 0xc1bb5429 SS:ESP 0069:e9eadd00
[ 300.391340] ---[ end trace 750438800d7fe836 ]---

Similar messages come up on my other Xeon boxes when I try to copy files, but since they don’t have HyperThreading, the same fix isn’t applicable here. However, since they work fine when not using the Xen-modified kernel, we’ll likely just re-task them for other work.

But that’s besides the point: what the hell about this error message would have told you that HyperThreading was the culprit? I’m certainly no newcomer to Linux (although I don’t do any kernel hacking) but there is no way a reasonable user could have figured out that HyperThreading was the culprit. Xen was announced five years ago, like I said, this is not a new technology. The Hardware Compatibility List shows Xeon servers, so we know they’ve tested Xen with Xeon boxes. It’s obviously impossible to test every combination of hardware and software, but there’s no reason to see this on sixdifferent Xeon boxes with different hardware (only one had HyperThreading).

That being said, I’ve learned an important lesson today:

If you’ve tried everything logical to fix a problem and failed, try the most illogical thing you can think of.

Setting up Linux on the Cell (Part 2)

So…Yellow Dog Linux doesn’t like SDK 2.1 very much. Turns out I must have done something wrong while trying to install it because it only wants to work on Fedora and I fucked up my ability to compile code for the SPEs and PPE while doing so. So I reinstalled Yellow Dog and this time, followed the very simple explanation at the RapidMind website which said just to install the libspe2 rpm and not mess around with anything else. It worked great, but let’s step back a second here.

SDK 2.1 (and I presume the new SDK 3.0) install a Cell Simulator on your machine. What the hell is the point of that? Why is it installing a Cell Simulator on my Cell? Yes, this works fine if you’re on a Fedora box that you want to develop code for Cell on that isn’t a Cell box. But if I want to develop on my Cell with the new SDK, don’t make me go get another box that isn’t a Cell to go do it! To quote what I’ve been hearing a lot lately, “What were they thinking?” The IBM official instructions say I can do this on a Cell box, but what the hell is the problem here!
/rant

So reinstalling Yellow Dog and RapidMind by their instructions works fine. But I do have a couple gripes which may just be standard Linux issues that I never realized until now for some reason:

    If I sudo up to root, then I should be able to do anything that root could do if root logged in! But I can’t! I can’t boot into the ps3 game OS, I can’t even add groups! What the hell! Completely ridiculous.
    (Not a Linux problem) X/Gnome looks like shit on my monitor. Holy crap I never realized how bad the analog connections on the PS3 look until I fired up X. If you have the choice and can get a digital connection and a monitor that suports HDCP, buy it. I’ll put up a screenshot of how miserable this looks sooner or later.

I’m sure there are other things, and I’ll put them up as I remember them. Asides from those little gripes, the environment is actually very nice. Here’s a nice how-to that helped me out a bit: CellPerformance.Com. I’d link directly to the RapidMind page too, but you have to login for that (although it is free to sign up).

Anywho, we’re gonna benchmark the Cell versus a slightly-faster-than-average PC and do some other cool stuff with that. Check back later for how that turns out.

Setting up Linux on the Cell (Part 1)

So we got a PS3 for research in the lab the other day and we got to set up Linux on it. It’s actually pretty painless, although if you can, get a monitor that supports DVI and supports HDCP so you can get the digital connection. We only have analog in the lab so it looks like crap. Seriously though, it looks like feces and I have to use SSH to get anything done. That being said, here’s how to do it:

This actually isn’t a how-to or anything. We just followed the directions at the IBM website and it was all set up. I actually wanted to set up RapidMind to fiddle with parallelizing applications in an “easier” fashion (check back later to see if it actually was easier) and needed to set up libspe2. Libspe1 comes with Cell, but to set up libspe2, you’ll need to install the Cell SDK 2.1 (2.0 comes with Cell and the IBM link above, use this link for 2.1). I’m installing it right now, but if it doesn’t work, I’ll let you know, and if it gets really bad, it will become a how-to. Either way, I’m looking at using Cell for multigrid method solving and visualization, or maybe something unrelated, but I’ll let you know.

UPDATE (10/30/07): Turns out installing SDK 2.1 went terrible. More on this later.

profile for Chris Bunch at Stack Overflow, Q&A for professional and enthusiast programmers