Byzantine Reality

Avatar

Searching for Byzantine failures in the world around us

Articles tagged with 'rapidmind'

State of the Cell

Originally posted at http://cs.ucsb.edu/~cgb/stateOfTheCell.html, and thus, looks much better there.

The Cell Broadband Engine Architecture (which we shall refer to as simply the Cell architecture) was designed as a compromise between the general-purpose but slower CPU and the specific-purpose and faster GPU. It is a heterogeneous architecture: it contains processing units that specialize in different tasks. However, critics (and even some fans) of the Cell architecture claim that it is incredibly difficult to produce good, fast code on it. Having spent the last quarter working with the Cell architecture, we agree with this sentiment. But why?

Our paper, ‘Lack of Abstraction Considered Harmful’, provides a review of work on Cell as well, work we have done on Cell, and an analysis of this issue. This web page serves as a summary of that paper.

Page Layout

  • Introduction
  • Algorithms and Technologies Used
  • Design Process
  • Results
  • Pros and Cons of Cell
  • Introduction

    In order to test how easy it is to program fast code on the Cell architecture, we implement two different algorithms: the Hadamard product and matrix multiplication. We implement these algorithms on our Cell system at Fred Chong’s Architecture Lab (the Archlab). Our Cell system is a PlayStation 3 with Yellow Dog Linux 5 and the Cell SDK 2.0.

    Algorithms and Technologies Used

    Yellow Dog Linux 5 and Cell SDK 2.0

    Detailed instructions exist for installing Yellow Dog Linux 5 and Cell SDK 2.0 on the PlayStation 3, as well as for Fedora Core 7 and Cell SDK 3.0 on the IBM Blade Servers, but none exist for getting the new Cell SDK 3.0 on the PlayStation 3 (Update). The company that supports Yellow Dog Linux, Terra Soft, offers the SDK with support, but for $4000, it is far out of our availability. Furthermore, although the SDK can be downloaded for free and used on other Linux boxes as a cross-compiling system, we were unable to get it to work on Gentoo Linux or Debian Linux. They seem to specifically require Fedora Core, which we did not have access to. Since these newer SDKs contain the optimized IBM XL C Compiler (IBM cross-compiler for C), we used the versions of gcc for the PPE and SPEs.

    RapidMind

    RapidMind is a metaprogramming language for C++ that allows the programmer to take their existing C++ code and port it to the Cell architecture (or the GPU architecture) with seemingly little work involved. The programmer annotates the parts of their code where parallelization can be performed, and the RapidMind framework converts this code to “optimized” C++ code. The speed of the resulting code varies vastly on the skill of the programmer, as we will show.

    Cell Broadband Engine Architecture

    We defer a detailed analysis of the Cell architecture to its Wikipedia article and to the presentation we gave at UC Santa Barbara on contributions to the IBM cross compiler, as the novel architecture has been discussed extensively in the various papers we have reviewed this quarter.

    Hadamard product and Matrix Multiplication Algorithms

    Both of these algorithms are relatively simple as far as computer science goes. Although matrix multiplication will be familiar to computer scientists, the Hadamard product may not be. It takes two matrices and produces a new matrix of the same dimensions, with each element being the product of the two elements given as input.

    Design Process

    For our comparisons we use three renditions of the Hadamard product algorithm and three renditions of the matrix multiplication algorithm. For the Hadamard product, we have constructed one version that only runs on the PPE, a version that uses the PPE and four SPEs, and a final version that uses RapidMind. We only use four of the six SPEs since the matrices we are multiplying are square and thus it is easier to split up into four blocks instead of six. For the matrix multiplication algorithm, we use one version that only runs on the PPE. We also wrote a version that uses RapidMind, but is a more naive version compared to the optimized RapidMind version distributed by RapidMind. The code for the optimized RapidMind version can be found at their website, while the versions we have implemented are available here.

    Results

    For the six algorithms we have run on Cell, we see the following performance:

    hadamard.png

    matrixmultiply.png

    We see that in both cases, our PPE only code runs faster than our naive RapidMind implementation by three orders of magnitude (a factor of 1000 difference). This scales up exponentially with the number of rows in the matrices (since the y-axis is on a log scale), and we also see that the optimized RapidMind and our PPE / 4 SPE programs scale up much better. Unfortunately, both compilers / frameworks were difficult to work with (although in different ways). We were unable to send more than 16 KB of data to the SPEs, so as a result we have no data for the Hadamard product with more than 16 KB of data. Next, we discuss the difficulties and successes we have with Cell’s gcc / spu-gcc and RapidMind.

    Pros and Cons of Cell’s gcc / spu-gcc

    Pros:

  • Super-fine level granularity over the architecture assures the best performance out of the architecture. Some of the papers we’ve seen this quarter have shown the amazing speedups Cell can achieve. If you don’t mind spending the extra time to really work out your algorithm, its speed will certainly end up first-class.
  • Can program in assembly or C as needed. Certainly most of you will prefer C, but to get the fastest programs, the masochist in you will want to write this program in assembly. Take my advice though: don’t do it. We’ve argued that Cell needs more abstraction, so don’t go with less by programming in assembly. But if you really wanted to, at least you could.
  • Being able to use I/O (esp. printf) inside of the SPEs kicks ass. It’s much slower but gives the bad debugger some extra data to work with. Contrast this with CUDA (see Multigrid on CUDA), where you are not allowed to hit the screen’s I/O at all and have to save outputs in variables that the CPU can print out later.
  • Cons:

  • Having to manage mandatory factors such as memory alignment and data transfer is obscenely difficult. It literally should be a crime to make the programmer do this. There’s no reason the compiler can’t, especially given the fact that all other compilers seem to do this.
  • Having to manage optional factors such as branch prediction is ridiculous. Just like before, there’s no reason that the compiler can’t look at my loops and guess what’s needed next most of the time. This goes double considering the fact that branches are predicted not-taken ALWAYS. Come on! Programmers write code expecting it to get used! We don’t write branches knowing they’ll never get used! Even guessing always taken would be better than always guessing not-taken.
  • Inconsistency between data types on the PPE and SPEs. I know, they have a different instruction set and are optimized for different things, blah blah blah. But I should be able to define a vector of a certain type and have that code work on the PPE or SPEs, even if it gets translated differently or performs differently. I shouldn’t have to memorize the different types of vectors for the PPE and SPEs just to get some use out of them.
  • Pros and Cons of RapidMind

    Pros:

  • Since RapidMind is built on top of C++, it is very simple to reuse small programs with the framework.
  • RapidMind abstracts the difficult-to-deal-with hardware away from the programmer. This is surely RapidMind’s greatest strength, as the lack of abstraction hurts the programmer’s productivity more than anything else we saw.
  • RapidMind is free for academic use! It’s easy to set up and has good documentation, so if you’re an academic type and have a PS3, go check it out!
  • Cons:

  • Resolving compiler issues is virtually impossible. See the screenshot from using RapidMind on the GPU in the Multigrid Methods on CUDA project to see more about that.
  • You need to be an expert at RapidMind to write fast RapidMind code. We constructed our program by following the recommendations at the developer’s site, but our final code for matrix multiplication was far different from the RapidMind team’s. They give many ways to solve problems but no “best practices”, leaving us with an algorithm that is substantially slower than theirs.
  • Some RapidMind variable types do not work as expected. RapidMind’s boolean type cannot be used alone in conditional statements, and having to refactor code that uses that eats a lot of time and readability out of the program.
  • Conclusion

    We have seen on Cell that it is currently difficult to write fast code, and difficult to write any code at all. This goes double for developing with gcc and spu-gcc. In order for Cell to be a viable language, tools will need to be developed that abstract away the various parts of the hardware we’ve encountered difficulties with (memory alignment, sending data to SPEs, etc). Doing so would not have been out of reach for the Cell developers, yet it appears they wanted the potential to get the maximum performance out of the hardware. Unfortunately, this decision makes it brutually difficult for the average programmer to get any real programming done. It also makes the programs dependent on the specifics of this version of the architecture. To illustrate this point: what do you have to change in your algorithm if a new Cell chip is designed with 10 SPEs and 512 KB local store instead of the current layout (8 SPEs and 256 KB local store). You could probably keep the same performance as before easily, but to get the most out of the architecture, you need to refactor your entire program. Why not let the compiler / scheduler / whatever do this for you? Computers were meant to make our lives easier, remember? I guess I finally see what the big deal about Ruby is, with Matsumoto’s “crazy” idea:

    “Often people, especially computer engineers, focus on the machines. They think, ‘By doing this, the machine will run faster. By doing this, the machine will run more effectively. By doing this, the machine will something something something.’ They are focusing on machines. But in fact we need to focus on humans, on how humans care about doing programming or operating the application of the machines. We are the masters. They are the slaves.”

    Hope is not lost for Cell. In fact, its quite the opposite. The potential for Cell is astounding. We just need to start harnessing that power the right way. Software complexity is already increasing on its own. We shouldn’t be making it even more complex. Here’s a more concise article that covers the same issue.

    Chris Bunch

    Update 12/22/07: So it turns out there are a couple guides for installing Fedora Core 7 on the PS3. I’m gonna try them out next week and see how they go.

    Multigrid Methods with CUDA

    Originally posted at http://cs.ucsb.edu/~cgb/multigridCUDA.html, and thus, looks much better there.

    Over the last quarter I have been investigating how to implement a multigrid algorithm with CUDA, a new language aimed at making General Purpose Computation on GPUs (GPGPU) easier. This web page documents this experience.

    Introduction

    It is not always possible to solve a partial differential equation (PDE). In fact, exact solutions are not known for most PDEs. With the advent of computers, however, we can obtain good approximations to these solutions. One class of methods that are used to solve PDEs are the Multigrid Methods. Compared to other types of methods used to solve PDEs, they require less iterations and thus are generally faster. For this project, we have elected to solve a common PDE, the heat equation in three-dimensions, given by the following equation:

    generalheateqn.png

    Implementing multigrid methods is not a new problem. Therefore, we choose to implement them on a new platform, the Graphics Processor Unit (GPU). This has become possible only recently with the advent of GPGPU languages such as NVIDIA’s CUDA (Compute Unified Device Architecture) and ATI’s CTM (Close To Metal), amongst others. Since we already have an NVIDIA GeForce 8800 in our lab, which is CUDA-capable, we used CUDA to implement this project.

    Algorithms and Technologies Used

    CUDA

    CUDA is a C-like language that allows programmers to “easily” port their code for use on the GPU. The language is mostly a super-set of the C language, allowing the programmer to flag which methods are run on the GPU and where variables are stored on the GPU. Although it is presented as a high-level language, low-level parts of the architecture are exposed to the user. As is the case with C, the programmer can allocate memory and de-allocate it (via cudaMalloc and cudaFree), but the programmer can choose specifically on the GPU where they want to put it. The following figure is from the CUDA documentation, showing the various memory locations available to the user:


    hwmodel.png

    Here, we can see that we can allocate memory in the graphics card’s shared memory, constant cache, texture cache, and device memory. Although the device memory is the slowest to access, it has the most space and allows us to not have to worry about cache coherence in the shared memory. Thus, we use the device memory. A critique of the CUDA language is given in the “Pros and Cons of CUDA” section.

    Multigrid Methods

    Typical methods used to solve PDEs represent the simulated area as a three-dimensional grid (or matrix). Therefore, it is easy to represent on a computer. Furthermore, we can represent the heat equation as solving a linear system of equations. The general and specific representations are as follows:


    linearsyseqn.png


    specificheateqn.png

    For our implementation, k is 1, A is the Laplace operator, x is u, and b is Ut. Three simple methods exist for solving this system of equations: The Jacobi Method, the Gauss-Seidel Method, and the Red-Black Gauss-Seidel Method. Our explanation for choosing the Red-Black Gauss-Seidel Method is given later, since this section focuses on Multigrid Methods in general.

    As was previously stated, regular methods use one grid and solve it using any of the aforementioned methods. Multigrid methods differ in the sense that they solve many smaller grids and use those as approximations to the solution of the full size grid. This is necessary because solving only one large grid does not give the fine-grained accuracy that solving the smaller grids does. Furthermore, solving many small grids once and the large grid a small number of times is computationally faster than solving the large grid many times. The methods needed to implement this will be described in the design section.

    Red-Black Gauss-Seidel Method

    The simple methods used to solve a linear system of equations are the Jacobi Method, the Gauss-Seidel Method, and the Red-Black Gauss-Seidel Method. As shown in A Multigrid Tutorial, the Red-Black Gauss-Seidel method converges the fastest out of these three methods (requires the least number of iterations). An added benefit specific to CUDA is that this method is easily parallelizable. In the standard Gauss-Seidel method, grid points must be updated one at a time. Although this works fine for serial computation, it is a poor choice for the parallel case. For Red-Black Gauss-Seidel, the odd points are all updated in parallel, and then the even points are all updated in parallel. The computation used to update each point is given by the average of that point’s neighbors.

    Design Process

    This being my first academic project since reading Code Complete, I decided to develop a prototype first. My C++ prototype simulates a 2D grid situation. It was mainly used to get a feel for the methods needed for the CUDA version, and as a result, is incomplete. It ended up mapping over to CUDA well, as follows:

    C++ Method NameCUDA Method NamePurpose
    ConstructorInitialize / MG_CUDA_AllocateMemCreates a grid. The C++ version needs a grid size; the CUDA version creates all the subgrids from the given size to a preset limit. Each subgrid is half the size of the previous grid.
    DestructorClose / MG_CUDA_FreeFrees the memory a grid has allocated. The CUDA version uses cudaFree to release all the memory it has been given via cudaMalloc.
    setupInitialConditionsHeatToggleThe grid is initialized to 0 degrees at all points by the constructor. The C++ version of this method sets all points in the first row to be 100 degrees, and the CUDA version sets the innermost area of the cube to be 200 degrees. This is done so that we will be able to see the heat flow throughout the geometry.
    restrictGridMG_CUDA_RestrictGrid / MG_CUDA_RKernelCreates a new grid half the size of the original grid. The C++ version copies down the points from the bigger grid to the smaller one, while the CUDA version sets each point to be the weighted average of the points around it.
    interpolateGridMG_CUDA_InterpolateGrid / MG_CUDA_IKernelCreates a new grid twice the size of the original grid. Both versions sets each point to be the weighted average of the points around it.
    solveGridMG_CUDA_SolveGrid / MG_CUDA_SGKernelBoth versions solve the given grid by using the Red-Black Gauss-Seidel Method. The odd numbered points are updated to be a weighted average of the points around them, and the process is repeated for the even numbered points. The C++ version does this until 50 iterations are done, while the CUDA version does 1 iteration.
    computeResidualMG_CUDA_ComputeResidual / MG_CUDA_CRKernelTakes the difference between each point in two grids and returns a new grid containing this difference.
    applyCorrectionMG_CUDA_ApplyCorrection / MG_CUDA_ACKernelAdds in each point in a grid to the given grid and returns a new grid.
    printGrid{none}Used for debugging purposes only while the visualization scheme was being developed.
    mainSimulatePerforms one iteration of our multigrid solver, which consists of restricting the grids with data from the previous iteration, solving the smaller grids, and solving the largest grid.

    The code for the multigrid prototype can be downloaded here.

    Results

    For the sake of timing purposes, we consider each execution of the ‘Simulate’ method to be one step, and we time how many steps can be performed per second. We test this for both the CUDA and C multigrid implementations for grids of size 16 x 16 x 16 through 256 x 256 x 256. We attempted to go larger (to 512), but the CPU’s memory allocation function (malloc) failed and could not allocate us the 512 MB of memory we needed for each of the four arrays we manipulate on the CPU (note the 512 MB matching up with the grid size 512 is coincidental). That being said, the speedup we achieve is given as follows:

    results.png

    For the purposes of this experiment, the % difference is also the speedup between the CUDA and C versions of the multigrid method.

    data.png

    What is interesting to note is that the CPU version is actually faster on the grid of dimension 64, and that the GPU is not notably faster as we have seen in other CUDA GPU applications. The reason for this is two-fold. First, we were unable to resolve issues we were having getting the Red-Black Gauss-Seidel method to work on our largest grid, so it had to be done on the CPU, and the integer modulo operator we use to get the even and odd points for Red-Black Gauss-Seidel works very fast on the CPU and very slow on the GPU. Furthermore, the grid was stored as a one dimensional array in memory, and to recover the three dimensional array structure, each point had to perform a swizzling method, which also involved using integer modulo operations. Add in the fact that we do this 16 million times per step (once per point on the grid, so 16 million is for the 256 grid) and you see why our speedup was not as good as it could have been. A low resolution video of our CUDA multigrid implementation has been recorded along with a high-resolution screen capture. Both are provided here in that order:


    Multigrid Preview Pic

    Multigrid Preview Pic

    Unfortunately, the video recording software we used does not record the high-frame rates of our program very well and thus lags much more than the actual version does. Furthermore, our version is more brilliant in terms of color as the screenshot on the right reveals. We use the a simple 3D rotation technique based on which axes the user is manipulating for rotating the cube, and we provide mouse combinations to allow the user to zoom in and out of the cube and translate themselves in space about the cube. These features are all documented in the above video. As we have refactored Brent Oster’s C code to produce our CUDA version, we have kept the interaction mechanisms he has implemented intact. We have added functionality to switch rendering on and off to be able to accurately measure how many steps are computed per second, as the rendering negatively impacts it.

    On December 14, 2007, this material was presented to our ‘GPGPU and 3D User Interfaces’ class. A PDF of the presentation can be found here. I’m currently working on putting the code up. The code used for this project consists of three files: NC_Multigrid.cpp, NC_Multigrid.h, NanoCAD2.cpp, and the CUDA source code, Multigridcu.cu. They can be found at the location “Projects/NanoCAD2/Modules/”, with the exception of the driver file, which is at “Projects/NanoCAD2/NanoCAD2.cpp”.

    Pros and Cons of CUDA

    CUDA provides a novel way for the programmer to use the graphics hardware to its fullest. It is both a high and low level language, but because it is a relatively new language, it is not without its downsides.

    Pros:

  • CUDA abstracts as much of the hardware away as you want it to. Although to really get the best performance you need to dive under the hood and program for the exact hardware you’re on, you can still get a decent speedup without worrying about it too much. The future of parallel programming languages will rely on their ease of use and how much they can abstract away from the programmer.
  • CUDA is an important step forward in parallel computing. The design choices made by the CUDA team will shape how future languages deal with heterogeneous processing systems (here a CPU and a GPU).
  • It’s really cool to launch 16 million threads and have them do something actually useful. Using less than 10 threads at a time in undergraduate classes gives you the impression that you can do everything you’d ever want to with just 1 or 2 threads. But it’s something else to launch a thread for each point in your grid, and if you have a 256 x 256 x 256 grid, launching 16 million threads really gives computing a whole different spin.
  • There is a lot of potential behind CUDA. Making supercomputing available to the general public will make it much easier to teach new programmers about how to program parallel machines. There’s also lots of work to be done resolving the downsides of CUDA, which are:
  • Cons:

  • Developing in CUDA on Windows with Visual Studio 2005, the supported environment of CUDA, is a nightmare. Getting started with CUDA was definitely the hardest part of this entire project. It took me no less than two weeks to make a new project and work with it. That being said, I’m no newcomer to Windows (used it almost my whole life) and Visual Studio 2005 (used it to develop C# applications). However, CUDA is something entirely different. There is a “custom build rule” for CUDA that take a lot of the work out of setting up Visual Studio for use with CUDA, but even after following the directions on using it, it still wouldn’t work. Even taking the sample projects that came with CUDA and trying to refactor them only worked for a short while, although this may have had to do with…
  • Remote Desktop Connection’s interaction with Windows makes it unusable. I know, this isn’t a CUDA problem, as RDC isn’t supported by CUDA, but there’s no reason it should be any different from using Visual Studio with other languages. RDC didn’t seem to consistently convert my newlines on Mac (Line Feed) to Windows (Carriage Return + Line Feed), so Visual Studio thought my code was only one line long. It didn’t stop the code from working, but it’s incredibly annoying to have to deal with those error messages about having a problem somewhere on your one line of code and having the compiler point at the beginning of your code as the problem. However, since the same code works fine with the right newlines, I’d be inclined to say the CUDA compiler (nvcc) doesn’t seem to like the mix of Line Feeds and Carriage Returns that RDC is putting in there. So I had to ditch RDC and sit in front of the computer I was developing on (as I was having issues getting VNC to work). I was having problems getting emulation mode to work, even after copying all the emulation mode flags the compiler needs, so I had to use the regular mode. That’s not a problem, however, until you see what happens when you use too many blocks in regular mode:
  • theusual.jpeg

    This wouldn’t be so bad except for the fact that this happens almost every time something goes wrong, not just with the blocks. In the last four hours of programming in CUDA, this has happened to me FIVE times. Future implementations of CUDA NEED to fix this for CUDA to become even close to viable as a good programming language. My computer shouldn’t die just because I used memory that I allocated and set to zero but forgot to copy the host’s version of to. That leads into the next issue…

  • CUDA inconsistently throws errors / crashes. Usually after I see the blue screen of death, I tweak the code and try to compile it, only to see one of thse two screens:
  • linker.png
    unexpected.png

    Trying to recompile the code gives no errors and the code runs fine. So what’s the deal? I can’t even get the blue screen of death consistently! Sometimes Visual Studio gives me a cudaError_enum, sometimes the computer locks up, and sometimes it gives me the blue screen. Why can’t it just be consistent, and consistently the first result?

  • GPU methods cannot call GPU methods, and consequently, recursion is not allowed. This is an unfortunate side-effect, and has caused the GPGPU teams to reimplement all the well-known recursive functions (e.g., quick sort) iteratively or find iterative algorithms that suit their purposes better (e.g., bitonic merge sort for the GPU). At this time, it is unclear whether this issue is CUDA-specific or specific to the GPU architecture.
  • The documentation, while long, fails to mention important macros you can call to improve the quality of your code. If you read the documentation alone, you’d have no idea there was a macro you could wrap your cudaMalloc and cudaFree calls around to make sure they succeeded or your program died. It’s only two extra macro to document, and perhaps two of the most useful macros in CUDA. I can’t tell you for sure, because I’ve only read the documentation…
  • CUDA will bring out your worst software engineering skills. This could be the greatest offender, depending on your programming skills. Structs aren’t allowed as of CUDA 1.0, forcing the programmer to write methods that pass a huge amount of data around. My typical method needed to throw around 6 or 7 variables when one or two well built structs would have been fine. Furthermore, without structs or objects (since it’s C and not C++), you’ll find yourself violating the DRY (Don’t Repeat Yourself) principleconstantly. If you’re not familiar with the DRY principle, you may think this isn’t a big deal, as you’re just repeating yourself. But this makes your code harder to maintain, even for just you who knows the code, and with CUDA being a new language, there’s no reason not to have a construct that packages up data. Perhaps it has to do with the architecture like recursion did…
  • Conclusion

    Whew! What a long review! It’s been a great quarter full of new opportunities, and thanks to all for letting me use what I’ve needed to get this done. Special thanks go out to Tobias Hollerer and Brent Oster for use of the hardware and software, respectively, that was used or refactored in some way for this project. CUDA has the potential to go far for sure, but there’s definitely some huge kinks it needs to work out before it’s ready for the general public.

    My Favorite Error Message Ever

    While running a Poisson Equation Finite Difference approximation algorithm (whew, a mouthful there!) today on Cell, I got the greatest error message ever:

    [cgb@cell pfd]$ ./withRapidMind
    terminate called after throwing an instance of ‘rapidmind::Exception’
    what(): The type ‘double’ is not supported by the Cell backend.
    Aborted

    What the hell indeed. Much thanks to the RapidMind guys for an awesome function called “what” and not supporting the double type (although I’m sure there’s a long explanation about that I haven’t found yet).

    Setting up Linux on the Cell (Part 2)

    So…Yellow Dog Linux doesn’t like SDK 2.1 very much. Turns out I must have done something wrong while trying to install it because it only wants to work on Fedora and I fucked up my ability to compile code for the SPEs and PPE while doing so. So I reinstalled Yellow Dog and this time, followed the very simple explanation at the RapidMind website which said just to install the libspe2 rpm and not mess around with anything else. It worked great, but let’s step back a second here.

    SDK 2.1 (and I presume the new SDK 3.0) install a Cell Simulator on your machine. What the hell is the point of that? Why is it installing a Cell Simulator on my Cell? Yes, this works fine if you’re on a Fedora box that you want to develop code for Cell on that isn’t a Cell box. But if I want to develop on my Cell with the new SDK, don’t make me go get another box that isn’t a Cell to go do it! To quote what I’ve been hearing a lot lately, “What were they thinking?” The IBM official instructions say I can do this on a Cell box, but what the hell is the problem here!
    /rant

    So reinstalling Yellow Dog and RapidMind by their instructions works fine. But I do have a couple gripes which may just be standard Linux issues that I never realized until now for some reason:

      If I sudo up to root, then I should be able to do anything that root could do if root logged in! But I can’t! I can’t boot into the ps3 game OS, I can’t even add groups! What the hell! Completely ridiculous.
      (Not a Linux problem) X/Gnome looks like shit on my monitor. Holy crap I never realized how bad the analog connections on the PS3 look until I fired up X. If you have the choice and can get a digital connection and a monitor that suports HDCP, buy it. I’ll put up a screenshot of how miserable this looks sooner or later.

    I’m sure there are other things, and I’ll put them up as I remember them. Asides from those little gripes, the environment is actually very nice. Here’s a nice how-to that helped me out a bit: CellPerformance.Com. I’d link directly to the RapidMind page too, but you have to login for that (although it is free to sign up).

    Anywho, we’re gonna benchmark the Cell versus a slightly-faster-than-average PC and do some other cool stuff with that. Check back later for how that turns out.

    Setting up Linux on the Cell (Part 1)

    So we got a PS3 for research in the lab the other day and we got to set up Linux on it. It’s actually pretty painless, although if you can, get a monitor that supports DVI and supports HDCP so you can get the digital connection. We only have analog in the lab so it looks like crap. Seriously though, it looks like feces and I have to use SSH to get anything done. That being said, here’s how to do it:

    This actually isn’t a how-to or anything. We just followed the directions at the IBM website and it was all set up. I actually wanted to set up RapidMind to fiddle with parallelizing applications in an “easier” fashion (check back later to see if it actually was easier) and needed to set up libspe2. Libspe1 comes with Cell, but to set up libspe2, you’ll need to install the Cell SDK 2.1 (2.0 comes with Cell and the IBM link above, use this link for 2.1). I’m installing it right now, but if it doesn’t work, I’ll let you know, and if it gets really bad, it will become a how-to. Either way, I’m looking at using Cell for multigrid method solving and visualization, or maybe something unrelated, but I’ll let you know.

    UPDATE (10/30/07): Turns out installing SDK 2.1 went terrible. More on this later.

    profile for Chris Bunch at Stack Overflow, Q&A for professional and enthusiast programmers