Friday, August 28, 2009

Programming high-performance applications on the Cell BE processor, Part 1: An introduction to Linux on the PLAYSTATION 3

The Sony® PLAYSTATION® 3 (PS3) is the easiest and cheapest way for programmers to get their hands on the new Cell Broadband Engine™ (Cell BE) processor and take it for a drive. Discover what the fuss is all about, how to install Linux® on the PS3, and how to get started developing for the Cell BE processor on the PS3.

The PLAYSTATION 3 is unusual for a gaming console for two reasons. First, it is incredibly more open than any previous console. While most consoles do everything possible to prevent unauthorized games from being playable on their system, the PS3 goes in the other direction, even providing direct support for installing and booting foreign operating systems. Of course, many of the game-related features such as video acceleration are locked out for the third-party operating systems, but this series focuses on more general-purpose and scientific applications anyway.
The real centerpiece for the PS3, however, is its processor -- the Cell Broadband Engine chip (often called the Cell BE chip). The Cell BE architecture is a radical departure from traditional processor designs. The Cell BE processor is a chip consisting of nine processing elements (note the PS3 has one of them disabled, and one of them reserved for system use, leaving seven processing units at your disposal). The main processing element is a fairly standard general-purpose processor. It is a dual-thread Power Architecture™ element, called the Power Processing Element, or PPE for short. The other eight processing elements, however, are a different story.
The other processing elements within the Cell BE are known as Synergistic Processing Elements, or SPEs. Each SPE consists of:
  • A vector processor, called a Synergistic Processing Unit, or SPU
  • A private memory area within the SPU called the local store (the size of this area on the PS3 is 256K)
  • A set of communication channels for dealing with the outside world
  • A set of 128 registers, each 128 bits wide (each register is normally treated as holding four 32-bit values simultaneously)
  • A Memory Flow Controller (MFC) which manages DMA transfers between the SPU's local store and main memory

The SPEs, however, lack most of the general-purpose features that you normally expect in a processor. They are fundamentally incapable of performing normal operating system tasks. They have no virtual memory support, don't have direct access to the computer's RAM, and have extremely limited interrupt support. These processors are wholly concentrated on processing data as quickly as possible.
Therefore, the PPE acts as the resource manager, and the SPEs act as the data crunchers. Programs on the PPE divvy up tasks to the SPEs to accomplish, and then they feed data back and forth to each other.
Connecting together the SPEs, the PPE, and the main memory controller is a bus called the Element Interconnect Bus. This is the main passageway through which data travels.
The most surprising part of this design is that the SPE's 256K local store is not a cache -- it is actually the full amount of memory that an SPE has to work with at a time for both programs and data. This seems like a disadvantage, but it actually gives several advantages:
  • Local store memory accesses are extremely fast compared to main memory accesses.
  • Accesses to local store memory can be predicted down to the clock cycle.
  • Moving data in and out of main memory can be requested asynchronously and predicted ahead of time.
Basically, it has all of the speed advantages of a cache. However, since programs use it directly and explicitly, they can be much smarter about how it is managed. It can request data to be loaded in before it is needed, and then go on to perform other tasks while waiting for the data to be loaded.
While the Cell BE processor has been out for a while in specialized hardware, the PS3 is the first Cell BE-based device that has been affordable and readily available. And, with Linux, anyone who wants to can program it.

It runs Linux? How do I get it on there?

It is unusual for gaming consoles to allow foreign operating systems to be installed on them. Since consoles are usually sold at a loss, they are usually locked down to prevent games from running on them without the publisher paying royalties to the console developer. Sony decided to open up the PS3 console a little bit, and allow third-party operating systems to be installed, with the caveat that they do not get accelerated graphics.
Because of this, you can now install Linux on the PS3. You have to jump through a few hoops, but it definitely works. Terra Soft Solutions has developed Yellow Dog Linux 5 in cooperation with Sony specifically for the PS3. It even offers, uniquely among distributions so far, support for those using it on PS3. Yellow Dog Linux (also known as YDL) has been an exclusively PowerPC-based distribution since its inception, so it was not surprising that Sony contracted it to develop the next version of YDL specifically for the PS3.
See below for instructions on installing the initial release of YDL 5 onto the PS3.

Preparing the PS3

To install Linux, you need several pieces of additional hardware:
  • A display and appropriate cabling
  • A USB keyboard
  • A USB mouse
  • A USB flash drive
On the display, there are a few gotchas to watch for. First of all, the 20GB PS3 only comes with an analog composite RCA plug for attaching to TV-like output devices. You can convert it to VGA through a special cable (see Resources for more information). Unfortunately, this operates only at 576x384. If you want better resolutions, you'll have to use the HDMI port. However, that can lead to additional problems. HDMI can be easily converted to DVI through a cable. So this should be able to be fed to a DVI-compatible monitor, right? Well, no. There is a content-protection protocol called HDCP. When outputting data over the HDMI port, the PS3 will not output any data to non-HDCP-compliant devices. Therefore, unless your monitor is HDCP-compliant, you cannot use it to get digital output from the PS3, and you're stuck with 576x384 (though some have reported higher resolutions using component video output rather than composite).
To prepare the PLAYSTATION 3, perform the following steps:
  1. Connect the ethernet cable to the PS3. Be sure the network has a DHCP server on it.
  2. If this is a fresh-from-the-factory PS3, go through the setup steps as it prompts you on your first bootup, including setting the language, time, and a username for the PS3 system.
  3. Go to Settings, then System Settings, and choose Format Utility.
  4. Select Format Hard Disk, and confirm your selection twice.
  5. Select that you want a Custom partitioning scheme.
  6. Select that you want to Allot 10GB to the Other OS. This will automatically reserve the remaining disk space for the PS3's game operating system. When finished, it will restart the system.
  7. When the system restarts, go to Settings then System Update.
  8. Choose Update via Internet.
  9. Follow the screens for the system update to download and install the latest system updates. Some screens only have cancel buttons, with no instructions on how to move forward. In order to move forward on those screens, use the X button on your controller.
  10. Once the PS3 restarts, it's ready to have Linux installed on it.

Preparing to install

Now you're ready to prepare the Linux side of things. Here are the steps you need to do on your own computer (not the PS3) to prepare for the installation:
  1. Download and burn the YDL 5 DVD ISO. There is no CD install -- the PS3 only takes DVDs.
  2. Download the PS3 OtherOS installer from Sony (see Resources) and save it as otheros.self. This is the file that runs on the PS3 game operating system to install foreign bootloaders. NOTE: otheros.self is no longer needed if you are using PS3 firmware 1.60 or higher.
  3. Download the YDL bootloader from Terra Soft (again, see Resources) and save it as otheros.bld. This will be the bootloader that the Sony installer will install.
  4. Insert a USB flash drive into your computer.
  5. At the top level of your flash drive, create a directory called PS3. Immediately under the PS3 directory, create another directory called otheros.
  6. Copy the last two files you downloaded, otheros.self and otheros.bld, into the PS3/otheros directory you just created on your flash drive.
Now it is time to install.

Performing the installation

Perform the following steps on the PS3 to install Linux onto it:
  1. Remove the flash drive from your computer and insert it into the PS3.
  2. Go to Settings, then System Settings, and then choose Install Other OS.
  3. Confirm the location of the installer, and follow the screens for the installation process. Note that this only installs the bootloader, not the operating system.
  4. When the installer finishes, go to Settings, then System Settings, and select Default System. Then choose Other OSand press the X button.
  5. Insert the YDL 5 DVD.
  6. Plug in your USB keyboard and mouse.
  7. Now restart the system. You can either do this by holding down the PS button on the controller and then choosing Turn off the system, or by simply holding the power button down for five seconds. Then turn the system back on.
  8. When it boots back up, it will look like it is booting Linux. That's because the bootloader is actually a really stripped down Linux kernel called kboot.
  9. When it gets to the kboot: prompt, type install if your output is going through the HDMI port, or installtext if you are going analog. The remaining instructions assume you used the installtext option, but there is little difference.
  10. After media verification it may give a Traceback error in the blue area of the screen. Just ignore this and proceed through the installation screens.
  11. When it asks about partitioning, don't be concerned about it erasing the PS3 game operating system. The PS3's Other OS mode only allows the guest operating system to see its own portion of the drive. Even low-level utilities cannot see the other parts of the drive. So go ahead and let YDL erase all of the data on your drive, and then let it remove all of the partitions and create a default layout.
  12. When it gets to the package installation, it takes approximately one hour to install the packages. However, it does not install the whole DVD.
  13. When it reboots, if you are using analog output, you need to type in ydl480i at the kboot: boot prompt. Otherwise it will likely change the output to a resolution that the analog output isn't capable of.
  14. When it boots, it will bring up a setup tool. There is nothing you really need to do here. If you don't do anything, it will time out and finish the bootup process.
And there you have it! YDL 5 is now on your PS3!

Post-install setup

Unfortunately, the installation program doesn't take care of all of the details, especially for analog displays. You still have several steps to do if you want to do things like automatically boot at the proper resolution, configure the X Window System on an analog device, and install the Cell BE SDK. For all of these, go ahead and make sure your YDL 5 DVD is in the drive, and mount it like this:
mount /dev/dvd /mnt 

All of the instructions will assume the install DVD is mounted in this way, and that you are logged in as root.
To get an analog system to boot into its proper resolution at startup, edit the file /etc/kboot.conf, and change the line which reads default=ydl to default=ydl480i and save the file.
If you want to configure the X Window System for your analog device, you need to install and run the Xautoconfig package like this:
rpm -i /mnt/YellowDog/RPMS/Xautoconfig-* Xautoconfig 

Now you can start the X Window System by running startx, though on an analog device your screen is pretty tiny. Here's a quick hint to help you get around on such a tiny device: holding down alt+left mouse button will allow you to drag screens around on your desktop, even if you can't see the title bar.
If you want your system to have a graphical login at system boot, you need to edit the /etc/inittab file. Change the line id:3:initdefault: so that it says id:5:initdefault: and save the file. Now when you reboot the system, you will have a nice graphical login. Remember after you reboot to mount the DVD as shown above for the remaining steps. Note that Nautilus actually mounts it in a different location, so if you use Nautilus to mount your DVD, it will be mounted on /media/CDROM rather than /mnt.
Now to install the Cell BE SDK V2.0. To see if it is already installed by the installer, simply do which spu-gcc. If it is unable to find the program, then the SDK was not installed. To install it, you need to do the following:
cd /mnt/YellowDog/RPMS
rpm -i spu-binutils-* spu-gcc-* spu-gdb-* spu-utils* libspe-devel-* 

However, one important set of packages did not get included on the DVD -- the 64-bit version of libspe. However, that is easily remedied. Get the SRPM of libspe either from the source DVD or from the Web site (it is called libspe-1.1.0-1.src.rpm). Then go to the directory you downloaded it into and perform the following steps:
rpm -i libspe-*.src.rpm
cd /usr/src/yellowdog/SPECS
rpmbuild -bb --target ppc64 libspe.spec
cd ../RPMS/ppc64
rpm -i elfspe-* libspe-*

Now you're all set to go. YDL 5 is installed, configured, and ready to go!
Some of you might be wondering how to get back into the game operating system, for the off chance that you might want to play a game or two on your PS3. To do this, run boot-game-os from either the kboot: prompt or from the command line. If for some reason Linux is causing errors and won't load, you can load the game operating system by powering off the PS3, and then holding down the power button for five seconds (until you hear a beep) when powering it back on. Either of these methods will load the game operating system, but it will also set the default system to be the game operating system as well. So, to boot back into Linux, you'll have to go back into the settings and set it to boot the Other OS by default.

Okay, I've got Linux installed. Now what?

Now that you have Linux and the Cell BE SDK fully installed, the rest of this series will be about programming and using it. However, for a teaser, see a short introductory program in C which utilizes both the PPE and an SPE below.
Before looking at how this works, take a look at some of the common tools used to build Cell BE programs:
  • gcc
    Our trusty compiler, built for generating PPC Linux binaries for the PPE. Use the -m64 switch to generate 64-bit executables.
  • spu-gcc
    This is also our trusty compiler, but this one generates code for the SPEs.
  • embedspu
    This is a special tool that converts SPE programs into an object file that can be linked into a PPE executable. It also creates a global variable that refers to the SPE program so that the PPE can load the program into the SPEs and run the program as needed. To embed into 64-bit PPC programs, use the -m64 flag.
Without the SPEs, the Cell BE processor is programmed essentially like any other PowerPC-based system. In fact, you can pretend that they don't exist and your code will work just fine. Doing this, however, will leave much of your computing power untapped. To take advantage of the SPEs, you will have to put in just a little more effort.
If you are new to Cell BE technology, remember that the PPE is the resource manager for the system. It handles operating system tasks, regulates access to memory, and controls the SPEs. The code for the PPE takes care of initializing the program, setting up one or more SPEs with tasks, and performing input and output. Of course, the PPE can also perform processing tasks as well, but generally the point is to offload all that is reasonable to the SPEs.
So, take a look at how a simple program is constructed to perform processing tasks on the SPE. The program will be very elementary -- it will calculate the distance travelled given a speed in miles-per-hour and a time in hours. Here is the code for the PPE (enter as ppe_distance.c):

Listing 1. Equation solver PPE code

#include <stdio.h>
#include <libspe.h>

//This global is for the SPE program code itself.  It will be created by
//the embedspu program.
extern spe_program_handle_t calculate_distance_handle;

//This struct is used for input/output with the SPE task
typedef struct {
float speed;     //input parameter
float num_hours; //input parameter
float distance;  //output parameter
float padding;   //pad the struct a multiple of 16 bytes
} program_data;

int main() {
program_data pd __attribute__((aligned(16)));  //aligned for transfer

printf("Enter the speed at which your car is travelling in miles/hr: ");
scanf("%f", &pd.speed);
printf("Enter the number of hours you have been driving at that speed: ");
scanf("%f", &pd.num_hours);

//Create SPE Task
speid_t spe_id = spe_create_thread(0, &calculate_distance_handle, &pd, NULL,
-1, 0);
//Check For Errors
if(spe_id == 0) {
fprintf(stderr, "Error creating SPE thread!\n");
return 1;
//Wait For Completion
spe_wait(spe_id, NULL, 0);

printf("The distance travelled is %f miles.\n", pd.distance);
return 0;

As mentioned before, the main job of the PPE in the Cell BE processor is to handle the input and output tasks. The only really interesting part is spe_create_thread. The first parameter is a thread group ID (zero indicates that it should create a new group for the thread), the second parameter is the handle to the SPE program, the third parameter is the pointer to the data you want to transfer, the fourth parameter is an optional environment pointer, the fifth parameter is a mask of which SPEs you are willing to run the program on (-1 indicates any available SPE), and the final parameter is a list of options you want to employ (in this case, you don't want any). The function returns the SPE task ID number, which you then use as a parameter to spe_wait.spe_wait returns when the SPE task terminates.
Here is the code for the SPE (enter as spe_distance.c):

Listing 2. SPE calculation example

//Pull in DMA commands
#include <spu_mfcio.h>

//Struct for communication with the PPE
typedef struct {
float speed;     //input parameter
float num_hours; //input parameter
float distance;  //output parameter
float padding;   //pad the struct a multiple of 16 bytes
} program_data;

int main(unsigned long long spe_id, unsigned long long program_data_ea, unsigned
long long env) {
program_data pd __attribute__((aligned(16)));
int tag_id = 0;

//Initiate copy
mfc_get(&pd, program_data_ea, sizeof(pd), tag_id, 0, 0);
//Wait for completion

pd.distance = pd.speed * pd.num_hours;

//Initiate copy
mfc_put(&pd, program_data_ea, sizeof(program_data), tag_id, 0, 0);
//Wait for completion
return 0;

Future articles examine SPU programs more in depth, but here's a quick rundown of what is happening. The pointer passed as the third parameter to spe_create_thread comes in to this program as program_data_ea. EA stands for effective address, which is a main memory address as viewed from the main PPE program. Since the SPE does not have direct access to main memory, you cannot directly dereference this as a pointer. Instead, you must initiate a transfer request to copy data into your local store. Once it is in your local store, you can access the data through the local store address, sometimes abbreviated as LSA.
mfc_get initiates the transfer into the local store. Notice that in both the PPE and the SPE the struct was aligned to 16 bytes and padded to 16 bytes. We will deal with this more in a later article, but for the most part DMA transfers must be aligned to a 16-byte boundary and be sized as a multiple of 16 bytes. The tag_id allows for you to retrieve the status of the DMA operation. After the transfer, the next two functions cause the program to wait until the transfer is completed.
The main processing is trivially simple -- just multiplying the speed and the time. After the data is processed, mfc_put initiates a transfer back into main memory, and the next two functions cause you to wait for DMA completion. When all of that is done, the program exits.
Now you have to compile and run the program. That is actually pretty simple:
#Compile the SPE program
spu-gcc spe_distance.c -o spe_distance
#Embed the SPE program into an ELF object file, and expose it
#through the global variable: calculate_distance_handle
embedspu calculate_distance_handle spe_distance spe_distance_csf.o
#Compile the PPE program together with the SPE program
gcc ppe_distance.c spe_distance_csf.o -lspe -o distance
#Run the program

And there you have it! A fully working Cell BE program.


While you can't program the PS3 directly, its support for third-party operating systems allows you to install Linux on your PS3. Installing Linux on the PS3 takes a little bit of effort, but in the end you get a low-cost, fully working Cell Broadband Engine processor. Future articles in this series go in depth into programming the Cell BE and extracting every ounce of speed that you can from the SPEs.



No comments:

Post a Comment