PlayStation 3 Architecture

Supporting imagery

The original PlayStation 3 or 'PS3'
Released on 11/11/2006 in Japan, 17/11/2006 in America and 23/03/2007 in Europe

The PS3 2000/3000 series (a.k.a. 'Slim')
Released on 01/09/2009 in Europe and America; and 03/09/2009 in Japan

The PS3 4000 series (a.k.a. 'Super Slim')
Released on 09/2012 internationally

Main architecture diagram

A quick introduction

In 2006, Sony unveiled the long-awaited ‘next generation’ video-game console, a shiny (albeit heavy) machine whose underlying hardware architecture continues the teachings of the Emotion Engine, that is, focus on vector processing to achieve power, even at the cost of complexity. Meanwhile, their new ‘super processor’, the Cell Broadband Engine, is conceived during a crisis of innovation and will have to keep up as trends for multimedia services evolve.

This write-up takes a deep look at Sony, IBM, Toshiba and Nvidia’s joint project, along with its execution and effect on the industry.

On the article’s length

I’m afraid this article is not the typical ‘lunchtime’ one that I usually write for other consoles in this series. If you are interested in every area of the Playstation 3, you are in for the whole journey! Having said that, this writing encompasses ~6 years of research and development carried out by countless engineers, so I don’t expect you to digest it all at once. Please take your time (and breaks if needed) and if at the end you are hungry for more, help yourself at the ‘Sources’ section!

Welcome to the most recognisable and innovative part of this console.

Introduction

The PS3’s CPU is massively complex, but it’s also a very fascinating work of engineering that intersects complex needs and unusual solutions, prominent in an era of change and experimentation. So, before we step into the internals of the PS3’s CPU, I wrote the following paragraphs to bring some historical context into the article. Consequently, we’ll be able to decompose the chip from top to bottom in a way that not only you will understand how this chip works, but also get the reasoning behind major design decisions.

The PS1’s CPU (1994).
Designed by LSI and Sony, using MIPS' technology.

The PS2’s Emotion Engine (2001).
Designed by Toshiba, with MIPS' technology, again.

Almost ten years after the introduction of the original MIPS-powered Playstation, we find ourselves in the early noughties, and things are not looking good for SGI/MIPS. Nintendo recently ditched them for a low-end PowerPC core with IBM as their new supplier while Microsoft, the newcomer in this market, chose Intel and their x86 empire.

Sony has a history of grabbing existing low-end designs (cheap MIPS cores) and moulding them to achieve acceptable 3D performance at a reduced cost, a process that involved other companies like LSI (for the PS1’s CPU) and Toshiba (for the PS2’s Emotion Engine). This methodology carried on until 2004 with the release of the Playstation Portable. So, what new MIPS amalgamate were they going to build for the PlayStation 3?

Well, it turns out the development of the Playstation 3 predates the Playstation Portable one [1]. In 2000, months after the PS2’s release, Sony formed an alliance with IBM and Toshiba called ‘STI’ with the sole goal to deliver the next chip that could power the next generation of supercomputers [2]. If this didn’t sound extravagant enough, the next chip would also be used on the successor of the PS2. In the end, in 2004, IBM unveiled the Cell Broadband Engine (also known as ‘Cell BE’ or just ‘Cell’) [3].

A glance at Cell

Having explained all that history and theory, I think we are ready to bring forward the protagonist of this section. This is Cell:

The Cell Broadband Engine (PS3 variant).
Designed by IBM for supercomputing and scientific simulation.
The crossed out ‘SPE’ means it’s disabled (unusable).
The other ‘SPE’ on the left is reserved to the operating system.

… and by the end of this section, you’ll know what each component does.

Overall structure

Cell runs at a mighty 3.2 GHz and it’s composed of a multitude of components. So, for the sake of this analysis, this CPU can be divided into three main areas [6]:

The leader: this is the part of Cell that directs the rest of the circuitry. Here we find a component called Power Processing Element (PPE).
The assistants: these are as crucial as the PPE, but their capabilities are limited to an assistant/accelerator role. This group comprises eight Synergistic Processing Elements (SPEs).
The interfaces: As the need for bandwidth surges exponentially, newer interfaces are implemented to move data around without producing bottlenecks. In the interfaces group, we find a handful of protocols: the Element Interconnect Bus (EIB), the Broadband Engine Interface Unit (BEI), the Memory Interface Controller (MIC) and the Flex I/O buses.

This information will be revisited throughout the article in more depth, so you don’t have to memorise these names. The main goal of this section is to let the reader get a mental image of the nature of Cell and familiarise with all the components we’ll be discussing in due time.

How this study is organised

Seeing the previous structure, I had to organise this so you don’t get fed up with lots of information. Thus, we are going to analyse Cell by studying each component in this order:

The bus connecting all components, the Element Interconnect Bus (EIB).
The PowerPC Processing Element (PPE) and its core element, the PowerPC Processing Unit (PPU).
What general-purpose memory is available in this console.
The Synergistic Processing Elements (SPE) and their core element, the Synergistic Processing Unit (SPU).
The programming model devised to program Cell efficiently.

That being said, let’s begin the real analysis.

Inside Cell: The heart

Since its announcement, Cell has been referred to as a Network-on-Chip (NoC) [7] instead of traditional the System-on-Chip (SoC) definition, this is attributed to Cell’s unorthodox data bus, the Element Interconnect Bus (EIB). We’ve seen so far how demanding CPU components can be, in addition to how susceptible a system is to bottlenecks. Well, to tackle this for the eleventh time, IBM has devised a new design… and has documented it using terms analogous to road driving.

Simplified diagram of the Element Interconnect Bus (EIB).
Each arrow between ‘Ramps’ (nodes) represent two unidirectional buses, thus, each node is connected to the next one using four channels.

The EIB is made of twelve nodes called Ramps, each one connecting one component of Cell. Ramps are interconnected using four buses, two of them travel clockwise and the other two do so anti-clockwise. Each bus (or channel) is 128-bit wide. Having said that, instead of recurring to single bus topologies (like the Emotion Engine and its precursor did), ramps are inter-connected following the token ring topology, where data packets must cross through all neighbours until it reaches the destination (there’s no direct path). Considering the EIB provides four channels, there’re four possible routes (rings).

Now, you may think, what’s the point of a token ring if data may end up travelling longer paths (compared to a single direct bus)? Well, a single bus is highly susceptible to congestion. Hence, EIB’s engineers decided on this topology to tackle large amounts of concurrent traffic (keep reading if you want to know how the token ring helped).

Data is transferred in the form of 128-bit packets [8]. Each ring can carry up to three concurrent transfers as long as the packets don’t overlap. The EIB operates with the use of command credits, in other words, whenever a component needs to initiate a transfer, it sends a request to the EIB’s Data Arbiter, which manages the traffic within the rings. The moment the request is approved, packets enter the ring and receive a ‘token’, which the data arbiter uses as meta-data to supervise the transfer. Moreover, some components have preferential priority over others, like the Memory Interface Controller (MIC) component, which is where main RAM is located. Finally, the data arbiter will never place packets at rings whose path is longer than half of the ring.

Each ramp plays a part in the transfer, they read the destination address of the packet to know whether to post data to their respective component or forward it to the next Ramp. During each clock cycle, ramps can receive and send 128-bit (16 Bytes) packets at the same time. So, considering there are four channels and the EIB operates at 1.6 GHz (half of Cell’s speed), the theoretical maximum transfer rate is 16 Bytes x 2 transfer/clock x 4 rings x 1.6 GHz = 204.8 GB/s, this value is of course too optimistic and there are many other external factors (i.e. origin/destination path, state of the bus etc.) conditioning the performance. In any case, many research papers made by IBM and other authors compiled more realistic speeds using practical experiments [9].

Now that you’ve seen how every component of Cell is interconnected, it’s time to check out the first component of this chip…

Inside Cell: The leader

Here we will take a look at the ‘main part’ of Cell. That is the part of the silicon that is in charge of commanding the rest. The component’s name is PowerPC Processing Element (PPE) and you can think of it as the MIPS R5900 of the Emotion Engine.

Composition of the PPE

Remember how I divided Cell into different areas before? Well, the same can be done with the PPE. IBM uses the term ‘element’ to describe the independent machine[10], but once inside it uses the term ‘unit’ to separate the core circuitry from the interfaces that communicate with the rest of Cell.

Simplified diagram of the PowerPC Processing Element (PPE)

Having said that, the PowerPC Processor element is surprisingly composed of two parts:

PowerPC Processing Unit (PPU): This is the logical part of the PPE (the ‘core’). Don’t forget this is not Nintendo’s PPU! (even though they are sharing the same letters of the latin alphabet… in the same order…).
PowerPC Processor Storage Subsystem (PPSS): The big interface that connects the PPU to the outside world. Furthermore, it provides a whopping 512 KB of L2 cache.

As you can see, the design of the PPE (and the rest of Cell) is pretty modular, which follows the teachings of RISC design, you’ll soon see that the modularity is applied even inside the PPU.

The PowerPC Processing Unit

We are going to take a look at the insides of the PPU just now. To recap, we’ve dived into Cell, then into the PPE, and finally into the PPU. We’ll analyse the PPU just like any other CPU core.

To start with, the PPU is not built from the ground up but re-purposes existing PowerPC technology. However, unlike previous iterations where IBM grabbed an existing processor and half-updated it to meet new requirements, the PPE doesn’t succeed any previous CPU design. Instead, IBM constructed a new CPU that follows version 2.02 of the PowerPC specification (which happens to be the last PowerPC spec before being rebranded to ‘Power ISA’). To sum up, you won’t find the same design of the PPU on any existing chip from that date, yet, it’s programmed using the same machine code as in other PowerPC chips.

That being said, why did IBM choose PowerPC technology to develop a high-performance chip? Simple, PowerPC is a mature platform [11] that enjoyed ~10 years of Macintosh user-base testing and revisioning, it ticks all the boxes in Sony’s list and, if the need arises, it can be adapted to different environments. Last but not least, the use of a well-known architecture is good news for existing compilers and codebases which, for a new console, is a big starting advantage.

It’s worth mentioning that IBM was one of the authors of the first PowerPC chips, along with Motorola and Apple (recall the AIM alliance). Be as it may, towards the early 00s, the so-called alliance members were already working separately, where Motorola/Freescale developed a different PowerPC series from IBM.

PPU’s building blocks

By applying a ‘microscopic’ view to the PPU, we can observe this unit is composed of different blocks or sub-units which perform independent operations (load values from memory, carry out arithmetic, etc). The capabilities of the PPU are defined by what each block can do and how:

Simplified diagram of the Instruction Unit (IU)

The first block is called Instruction unit (IU) and as its name suggests, it pulls instructions from L2 cache and signals other units to perform the requested operation. Like its i686 contemporaries, part of the instruction set is interpreted with the use of microcode (the IU embeds a small ROM for this purpose). Lastly, the IU also houses 32 KB of L1 cache for instructions.

Instruction issuing is carried out with a 12-stage pipeline, though in practice the total number of stages will greatly vary depending on the type of instruction. For instance, the branch prediction block may bypass great parts. If we combine the IU with the neighbour units, the final number of stages is often close to 24 (yes, it’s a big number, but remember Cell runs at 3.2 GHz).

Now for the interesting parts, The IU is dual-issued: in some cases, the IU will dispatch up to two instructions at the same time, consequently improving throughput greatly. In practice, however, there are many conditions for this to work, so programmers/compilers are responsible for optimising their routines so their sequence of instructions can take advantage of this function. By the way, dual-issuing has been implemented by other CPUs as well, and the term varies between vendors, so here I used IBM’s definition.

Furthermore, to top it off, the IU is also multi-threaded, where the unit can execute two different sequences of instructions (called ‘threads’) at the same time. Behind the scenes, the IU is just alternating between the two threads at each cycle, giving the appearance of multi-threading. For some reason, this behaviour overlaps with what Intel currently defines as hyper-threading, it’s possible that the latter wasn’t coined yet. Nevertheless, IBM’s multi-threading mitigates unwanted effects like pipeline stalls, since the CPU will no longer be blocked if one instruction jams up the flow. To accomplish multi-threading, IBM engineers duplicated the internal resources of the IU, which includes general-purpose registers (previously I said there are 32 registers available, that’s per thread. In reality, there are 64 in total!), however, resources that don’t belong to the PowerPC specification (such as L1 and L2 cache; and the interfaces) are still shared. Thus, the latter group is single-threaded.

All in all, combining dual-threading with dual-issuing, the PPU can execute up to four instructions per cycle. Even though this is a ‘best-case scenario’, it still provides optimisation opportunities that users will ultimately notice in the game’s frame rate!

Wrapping the PPE up

You’ve just seen how the PPE works and what is made of, but what does it mean for a developer?

After all, the PowerPC Processing Element is only a general-purpose processor, but here’s the thing: it’s not meant to work alone. Remember that wide main bus (the EIB)? IBM designed the PPE so engineers may combine it with other processors to accelerate particular applications (i.e. HPC, 3D graphics, security, scientific simulations, networking, video processing, etc) and since this writing is about the PlayStation 3, you’ll see the rest of Cell is treated with computer graphics and physics in mind, so, the rest of the article reflects that purpose.

Outside Cell: Main Memory

Let’s now step out of Cell for a moment, as it doesn’t matter how good the PPE is if we don’t have a proper working space (memory) to put it to work.

Thus, Sony fitted 256 MB of XDR DRAM on the motherboard… But, again, what does this mean? To answer that, we need to take a look at how the memory blocks work and how they connect to Cell.

Cell next to the four 64 MB XDR DRAM chips

First of all, the type of memory fitted is called Extreme Data Rate (XDR). You may recognise XDR DRAM as the successor of the jinxed RDRAM found in the Nintendo 64 and the Playstation 2. But don’t jump to conclusions just yet!

Rambus, like any other company, improves upon their inventions. Their third revision (XDR) now operates at octa-rate (four times the rate of its adversary, DDR DRAM) [13]. Latency doesn’t pose an issue anymore, if we take a look at one of its manufacturers' data-sheets, XDR’s latency is reported between 28 ns and 36 ns [14], almost 10 times faster than the first-generation RDRAM chips.

The first revision of the Playstation 3’s motherboard contains four 64 MB chips, handled in pairs. XDR is connected to Cell using two 32-bit buses, one on each pair. So, whenever the PPU writes a word (64-bit data), it’s split between two XDR chips.

Cell next to the four 64 MB XDR DRAM chips

Cell connects to XDR chips is using the Memory Interface Controller (MIC), another component within Cell (like the PPE). Additionally, the MIC buffers memory transfers to improve bandwidth, but it has a big limitation: large byte alignment. In essence, the MIC’s smallest data size for transfers is 128 Bytes, which works well for sequential reads or writes. But if the data is smaller than 128 B or it requires alternating between writing and reading, performance penalties appear.

That being said, is the MIC a bottleneck or an accelerator? Well, you have to put it into perspective, bandwidth optimisation is critical in data-hungry systems. In the past, we’ve seen solutions like the write-gather pipe or the write back buffer, so the MIC is simply a new proposal to a recurring problem. For what it’s worth, Sony claims the transfer rate is 25.6 GB/s, still, there are too many factors that will condition the final rate in practice (you’ve seen how convoluted is to move data from one place to another within Cell).

This is as far as it goes for main RAM, but there’s more memory elsewhere: the hard drive. The PS3 also enables games to use 2 GB from its internal hard drive as a working area (similarly to what the original Xbox provided) [15].

Inside Cell: The assistants

We’ve seen before how Sony always supplies a general-purpose processor (the PPE in this case) accompanied by accelerators to reach acceptable gaming performance (the VPUs and IPU in the case of the PS2; and the GTE and MDEC with the PS1). This is common practice with video console hardware, as general-purpose can perform a wide range of tasks, but are not specialised in anything. Videogame consoles require only a subset of skills (physics, graphics and audio, for instance) so co-processors bring them up to their task.

[The PPE] is a watered-down version to reduce power consumption. So it doesn’t have the horsepower that you see in say a Pentium 4 (…) If you take the code that runs today on your Intel or AMD, whatever your power, and you recompile it on Cell, it’ll run today - maybe you have to change the library or two, but it’ll run today here, no problem. But it’ll be about 60% slower, 50% slower and so people say “oh my god! this cell processor’s terrible!” but that’s because you’re only using that one piece [16].

– Dr. Michael Perrone, manager of the IBM TJ Watson Research Center’s Cell Solutions Department

The accelerators included within PS3’s Cell are the Synergistic Processor Element (SPE). Cell includes eight of them, although one is disabled while the console boots up. This is because chip fabrication requires an exceptional amount of precision (Cell initially used the 90nm fabrication process) and the machinery is not perfect. So, instead of chucking out circuitry that came out < 10% defective, Cell includes one spare SPE. Thus, if one of them comes out defective, the whole chip is not discarded. Now, that spare SPE will always be deactivated, independently whether it’s fine or not (Sony can’t have two different PS3s in the market).

Composition of the SPE

Moving on, the Synergistic Processor Element (SPE) is a tiny independent computer inside Cell and commanded by the PPE. Remember what I explained before about the adoption of elements from homogeneous computing? Well, these coprocessors are somewhat general-purpose and not restricted to a single application, so they will be able to assist in a wide range of tasks, that is, as long as developers can program them properly.

Simplified diagram of the Synergistic Processor Element (SPE), there are eight of these within Cell (one disabled)

Just like we did with the PPE, we’ll have a look at the SPE. It’s a shorter one so if at the end, you’d like to learn more about the SPEs, check out the ‘Sources’ section at the end of the article. That being said, let’s start…

The SPE is a processor that follows similar structuring to the PPE, being made of two parts:

The Memory Flow Controller (MFC) is the block that interconnects the core with the rest of Cell, this is equivalent to the PowerPC Processor Storage Subsystem (PPSS) in the PPE. The MFC’s main job is to move data between SPU’s local memory and Cell’s main memory and keep the SPU in sync with its neighbours.

To perform its duties, The MFC embeds a DMA controller to handle communication between the EIB and the SPU’s local memory. Furthermore, the MFC houses another component called Synergistic Bus Interface (SBI) that sits between the EIB bus and the DMA controller. It’s a very complex piece of circuitry to summarise, but it basically interprets commands and data received from outside and signals the internal units of the SPE. As the front door to Cell, the SBI operates in two modes: bus master (where the SPE is adapted to requests data from outside) or bus slave (where the SPE is set to receive orders from outside).

As a curious fact, considering the limit of EIB packets (up to 128-bit long), the MFC’s Direct Memory Access block can only move up to 16 KB of data per cycle, otherwise, the EIB throws a ‘Bus Error’ exception during execution [17].

Architecture of the SPU

Like any CPU, the Synergistic Processor Unit (SPU) is programmed using an instruction set architecture (ISA). Both SPU and PPU follow the RISC methodology, however, unlike the PPU (which implements a PowerPC ISA), the SPU’s ISA is proprietary and mainly composed of a SIMD-type instruction set. As a result, the SPU features 128 128-bit general-purpose registers which house vectors made of 32/16-bit fixed-point or floating-point values. On the other side, to preserve memory, SPU instructions are much smaller, just 32 bits long. The first part contains the opcode and the rest can reference up to three operands to be computed in parallel.

This is very relatable to the previous Vector Floating Point Unit debuted in the PS2, but a lot has changed since then. For instance, the SPU doesn’t require developers to learn a new proprietary assembly language - IBM and Sony provided toolkits to program the SPUs using either C++, C or assembly.

In terms of design, this processor doesn’t execute all instructions using the same unit, execution is instead divided into two blocks or ‘execution pipelines’, one is called Odd Pipeline and the other is named Even Pipeline. These two pipelines execute different types of instructions, enabling the SPU to issue two instructions per cycle whenever it’s possible to do so. On the other hand, the SPU will never dual-issue instructions that depend on each other, thus, mitigating data hazards that may emerge.

Let’s now take a look at the two pipelines [19]:

Simplified diagram of the odd pipeline

The Odd pipeline executes most of the instructions except the arithmetic ones.

First of all, you’ll find the SPU load/store unit (SLS) does three essential things:

Houses 256 KB of local memory to store instructions and data. The type of memory fitted is single-ported (considering this is a critical area, it’s a bit disappointing they didn’t use dual-ported chips…). Furthermore, the address bus is 32-bit long.
Executes load and store instructions.
Forwards instructions to another block for issuing.

Notice that there are only 256 KB available to store the program. Considering SPU programs can be compiled using C/C++, it’s not easy to predict how large the program will be. For this reason, it is recommended that the programmes assume there’s only half the memory available (128 KB) [20], which leaves enough window for the compiled code to take up as much space as it needs, although this comes at the cost of storage and efficiency.

Finally, there’s also a SPU Channel and DMA Transport (SSC) unit, which the Memory Flow controller uses to fill and/or fetch local memory, and a puny Fixed-Point Unit that only does shuffling and vector rotation.

Inside Cell: Programming styles

As we reach the end of Cell, you may ask how are developers supposed to program this monster? Well, similarly to the previous programming models devised for the Emotion Engine, IBM proposed the following methodologies [21]:

PPE-centric approaches

Representation of the multistage pattern, where the PPE assigns a task that gets passed around each SPE, and eventually returned with the data processed.

Representation of the parallel pattern, where the PPE assigns a subtask to each SPE, and in turn each SPE returns the processed data, which the PPE merges.

Representation of the services pattern, where the PPE allocates a different task to each SPE, and each work individually to fulfil it.

PPE-centric approaches are a set of programming patterns that place the main responsibilities on the PPE and leave the SPE for offloading work. There are three possible patterns:

Multistage pipeline model: the PPE is tasked with sending work to a single SPE, which in turn performs the required computations and passes the results to the next SPE. This continues until the last SPE in the chain sends the processed data back to the PPE.
- For obvious reasons, IBM doesn’t suggest this design for primary tasks as it requires a considerable amount of bandwidth and it tends to be difficult to maintain.
Parallel stages model: the PPE divides its main task into independent sub-tasks and sends each one to a different SPE. Each SPE then returns the processed data to the PPE after they finish, then, the PPE combines it to produce the final result.
Services model: each SPE get assigned a single job (i.e. MPEG decoding, audio streaming, perspective projection, vertex lighting, etc) and the PPE is in charge of sending raw data to the designated SPE. While waiting, the PPE carries out other functions.
- While this implies each SPE will have a single job, their job designation is not meant to last forever. The PPE must reallocate different jobs on the fly as the needs of the program change.

SPE-centric approaches

Representation of the SPE-centric pattern, where each SPE is in charge of its functionality and only interacts with the PPE to obtain a resource.

Instead of using the SPEs to serve the PPE, it’s the other way around. Using the internal DMA unit, SPEs fetch and execute tasks stored in main memory, while the PPE is limited to resource management.

This model is a lot more radical than the rest, in the sense that previous patterns are closer to the traditional and PC-like ‘general-purpose processor with co-processors’ paradigm. Thus, codebases implementing SPE-centric algorithms may be harder to port to other platforms.

Conclusion

As you can imagine, while the multi-core design of Cell accelerates emerging techniques such as procedural generation, none of these designs are particularly simple to implement, especially considering game studios prefer codebases that can be shared across different platforms.

To give you an example, the developers of the Unreal Engine 3 (Epic Games) demonstrated the limitations of the SPUs while trying to implement their collision detection system [22]. Their design relies on Binary Space Partitioning (BSP), an algorithm strongly dependent on comparisons (branching). Because the SPUs don’t provide dynamic branch prediction like the PPU, their implementation disappointed Playstation 3 users when compared side-by-side with other platforms (i.e. Xbox 360 or i386 PCs, both of which provide advanced prediction techniques in all their cores). Hence, Epic Games had to resort to further optimisations only compatible with Cell.

I suppose it’s just a matter of time, patience and a lot of learning for software engineers to crank up the full potential of Cell. However, history demonstrated that’s not feasible for every studio, which makes me wonder if that’s the reason current console hardware (as of 2021) has homogenised so much.

Graphics

If you thought that Cell, with all its quirks, could take care of every task of this console, then let me tell you something hysterical: Sony fitted a separate chip for 3D graphics.

Uncharted 3: Drake’s Deception (2011).

The Elder Scrolls V: Skyrim (2011).

Killzone 3 (2011).

One Piece: Pirate Warriors (2012).

Example of PS3 games.
All rendered at their maximum resolution (1280x720 pixels).

It appears that even with a supercomputer chip, Sony still had to fetch a GPU to finalise the Playstation 3. This makes you wonder if IBM/Sony/Toshiba hit a wall while trying to scale Cell further, so Sony had no option but to get help from a graphics company. This is purely speculation however, I’m not sure if I’ll ever know the answer.

What I do know is that the PS3 contains a GPU chip manufactured by Nvidia meant to offload part of the graphics pipeline. The chip is called Reality Synthesizer or ‘RSX’ and runs at 500 MHz [23]. Its clock speed looks concerning when compared to Cell’s (3.2 GHz), though you’ll soon see that the GPU is better equipped for computing huge amounts of operations in parallel. So it’s a matter of finding a balance between Cell and RSX when it comes to building the graphics pipeline (though I must confess this sounds simpler on paper than it is in practice).

I will now perform the same level of analysis previously done with Cell, this time focusing on RSX and its graphics capabilities.

Overview

It’s been five years since Nvidia debuted the Geforce3/NV30 lineup in 2001, and by then the arena was battled by strong players like 3dfx, S3 and Artx/Ati. Though in subsequent years, the number of companies slowly reduced to the point that by 2006, only Ati and Nvidia remained as the flagship video card suppliers in the PC market.

RSX chip next to Cell

The RSX inherits existing Nvidia technology, it’s reported to be based on the 7800 GTX model sold for PCs, which implements the Geforce7 (or NV47) architecture [24], also named ‘Curie’.

In my previous Xbox analysis, I talked about the Geforce3 and their debuting pixel shaders, so what has changed since then? There have some been ups and downs, but mostly incremental changes, so nothing too groundbreaking compared to Geforce3’s pixel shaders.

On the other side, while the 7800 GTX relies on the PCI-express protocol to communicate with the CPU, the RSX has been remodelled to work with a proprietary protocol called Flex I/O [25], a distinct interface within Cell designed to connect to neighbouring chips. Flex I/O operates in two modes:

BIC mode for connecting other Cell processors (to be used in multi-processor environments).
The slower IOIF mode for connecting up to two peripherals, a ‘fast’ one and a ‘slow’ one.

Alas, the RSX is not Cell, so it goes through the IOIF protocol, using the fastest slot.

For comparison purposes, IOIF behaves as a 32-bit parallel bus with a theoretical bandwidth of up to 20 GB/s, while the PCI-express used in the 7800 GTX (x16 1.0) is a 16-bit serial bus with a theoretical bandwidth of up to 4 GB/s.

Organising the content

RSX has 256 MB of dedicated GDDR3 SDRAM at its disposal. Surprisingly, it’s the same memory type found in the Wii. The memory bus runs at 650 MHz with a theoretical bandwidth of up to 20.8 GB/s.

Example of how data is organised across the memory available. Notice how RSX can access its content from different memory chips.

Inside those 256 MB, Cell can place everything that RSX will need to render a frame. That includes vertex data, shaders, textures and commands. Now, thanks to Cell’s Flex I/O bus, RSX can also utilise the aforementioned 256 MB of XDR memory (CPU’s main RAM) as working space, though this will come with some performance penalties. This comes in handy if the rendered frame will be post-processed by an SPU, for instance.

As you can see, while this console didn’t implement a UMA architecture, it can still distribute graphics data across different memory chips if programmers decide to do so. I mention this because I wish many ‘technical explainers’ would read more about this feature before shouting over-summarising statements like “The PS3 was limited because it didn’t have UMA”. That may be true in certain cases, but unless they mention these, that generic claim is, in my opinion, misguiding.

Finally, RSX supports many forms of data optimisation to save bandwidth, examples include 4:1 colour compression, z-compression and ‘tiled’ mode (I’ll explain more about it later on).

Constructing the frame

Let’s now take a look at how RSX processes and renders 3D scenes.

Pipeline overview of the RSX

Its pipeline model is very similar to the Geforce3, but super-charged with five years of technological progress. So I suggest checking out that article beforehand since this one will focus on the new features, I also recommend reading about the Playstation Portable’s GPU because a lot of new developments and needs overlap with that chip. That being said, let’s see what we’ve got here… [26]

Diagram of the command stage

As with any other GPU, there must be a block of circuitry in charge of receiving the orders from outside. With RSX, this is handled by two blocks, Host and Graphics Front End.

The Host is responsible for reading commands from memory (either at local or main) and translating them into internal signals that other components in RSX understand, this is done with the use of four sub-blocks:

Pusher: fetches graphics commands from memory and interprets branch instructions. It also contains 1 KB of prefetch buffer. The processed commands are sent to the FIFO Cache.
FIFO Cache: stores up to 512 commands decoded by the Pusher in a FIFO manner to provide quick access.
Puller: as the name indicates, it pulls commands from the FIFO cache whenever RSX is ready to render and sends them to the next unit.
Graphics FIFO: stores up to eight commands that will be read by the Graphics Front End.

The Graphics Front end then reads from the Graphics FIFO and signals the required units inside RSX to compute the operations. If you remember, this is equivalent to the ‘pfifo’ in the Geforce3.

As you can see, commands and data pass through many buffers and caches before reaching the final destination. This is intentional, as it prevents stalling the pipeline due to different units and buses operating at different speeds. So, cached memory takes advantage of fast bandwidth whenever it’s possible.

A unified Video Output

Gone are the days of console-proprietary video sockets and dozens of analogue signals squashed together in a single socket to accommodate every region on the earth. The Playstation 3 finally incorporated a unified video signal soon to be adopted worldwide: the High Definition Media Interface (HDMI), used for transferring both audio and video at the same time.

Back of the PS3, HDMI output on the left side and at the other extreme there’s the old Multi A/V for analogue video out.

The HDMI connector is made of 19 pins [33], all in a single socket. It transfers a digital signal, meaning that the picture and audio are broadcast using discrete zeroes and ones (and not a range of continous values found in analogue signals). Consequently, it doesn’t suffer from the interference or image degradation that previous equipment did, such as screen artefacts produced by cheap SCART cables.

To this day, the HDMI protocol is continuously being revised [34], with new versions of the specification offering more features (i.e. larger image resolution, refresh rate, alternative colour spaces, etc) while retaining the same physical medium for backwards compatibility.

Throughout the PS3 lifecycle, Sony added certain HDMI features of new revisions into the PS3 through software updates [35]. The last protocol compatible with the PS3 is version 1.4, most notably bringing support for ‘3D television’, though other capabilities such as higher video resolutions stayed capped at 1920x1080 pixels (and even so most games rendered their frame-buffer at 1280x720 pixels).

‘Real’ 3D vision/projection

So what was that ‘3D television’ I mentioned before? Well, it so happens that the lifetime of this console overlapped with a short-lived fever for 3D tellies (the so-called 3DTV) [36]. To support these, Sony updated their SDK to assist the rendering of stereoscopic frames in RSX and implemented the ‘3D specification’ in their HDMI encoder. What’s happening behind the scenes is that the encoder broadcasts two frames at a time, and the television alternates them similarly to what the Master System’s 3D glasses used to do 30 years before.

Audio

I’m afraid you won’t see a lot of information in this section anymore, mainly because since the last portable invention, audio has silently shifted to the software side. In other words, there are no longer dedicated audio chips.

You see, while the need for better graphics tends to grow exponentially (consumers want more scenery, better detail and colours), you won’t hear the same level of demands for sound. I presume this is because the capabilities have reached our cognitive limit (44.1 kHz sampling rate and 16-bit resolution). The only thing left is to implement more channels and effects, but these don’t need the processing power that would require installing specialised chips, at least with consumer equipment.

Summary of the audio pipeline.

So, in the end, audio is now completely implemented with software and processed by the SPUs (I mean the Synergistic Processor Unit, not the Sound Processing Unit! it’s a bit ironic that both share the same initials…). Moving on, Sony provides many libraries in their SDK that instruct the SPUs to carry out audio sequencing, mixing and streaming. And if that’s not enough, many effects can also be applied.

That being said, where is the audio signal sent for broadcast? The RSX. This chip also contains the ports used to broadcast raw audio signals to the TV. Before sending it, the signal is encoded in different formats, depending on the output selected (analogue, HDMI or S/PDIF, the latter is also called ‘digital audio’).

I/O and backwards compatibility

All I/O operations are delegated to another chunky chip called Southbridge [37]. This is very similar to the architecture the original Xbox adopted back in the day. It seems as if the architectural gap between consoles is becoming narrower, or maybe this approach proved very reliable and it’s architecture-agnostic, I’ll let you decide.

The big Southbridge chip supervising small I/O chips and interfaces

The same picture with important parts labelled

Like the PS2’s IOP, the Southbridge is completely proprietary, though this time made by Toshiba (they called it the ‘Super Companion Chip’ [38]). So, while it still remains an obscure piece of silicon, it does a superior job consolidating many interfaces and protocols, both external (i.e. USB, Ethernet, etc) and internal (i.e. SATA). For reference, in the past, the IOP’s slow clock speed ended up bottlenecking speedy interfaces like ATA and Ethernet, greatly reducing their full bandwidth.

Furthermore, the southbridge implements encryption algorithms to protect the communication between standard protocols in a seamless way, such as the Hard Drive data.

Diagram of the Southbridge’s connections

Overall, Southbridge embeds an enormous amount of interfaces, this has to do with the fact this console was designed during the ‘multimedia hub’ trend. It’s not enough for video-game consoles to play games, but they also need to become DVD and Blu-ray players, set-top boxes (partially), photo viewers (by importing the camera’s photos using the multi-card reader) and possibly more as the needs evolve (thanks to its updatable operating system).

External interfaces

In the case of user-accessible ports, the Southbridge is connected to:

A USB 2.0 hub: provides four front USB A ports. These can be used for accessories or to link/charge the controllers.
A Serial ATA (SATA) interface: connects the Blu-ray drive and a 2.5" Hard Disk.
- Until 2008, Blu-ray readers interfaced with Parallel ATA [39], so an intermediate chip was fitted in the middle to do the SATA → PATA conversion.
1000/100/10 (Gigabit) Ethernet Controller: in the form of an RJ45 socket on the back, but it also forks to a Wireless daughterboard, providing Wi-Fi 802.11b/g and Bluetooth 2.0 connection.
A Multi-card reader: provides slots for Memory Stick, SD, MultiMediaCard (MMC), Microdrive and Compact Flash.

‘Less wire’ equipment

Thanks to wide the adoption of Bluetooth technology, wired controls are now a thing of the past. The new form of the PS2’s Dualshock 2 controller is called Sixaxis and while it’s not the radical change others decided to go for, it features a gyroscope for new types of human input. This comes at the expense of the haptic feedback (Rumble), however. A year later, Sony surprised players with the Dualshock 3, which restored the haptic motor.

On a different topic, you can now turn on the console from the wireless controller.

Internal interfaces

Regarding internal components, SouthBridge connects to:

Starship 2: an adapter for two 128 MB NAND Flash chips. Behind the scenes, Starship bridges the Southbridge’s local bus with the standardised ‘Common Flash Interface Protocol’ (widely adopted for interfacing Flash memory) [40]. The PS3 stores the operating system on these, among other things.
The Playstation 2 chipset: at the corner of the motherboard there’s an eye-catching chip that houses none other than the Emotion Engine and the Graphics Synthesizer. The EE+GS combo connects to 32 MB of RDRAM and an IO bridge (named ‘PS2 bridge’), which combined form roughly 90% of the original Playstation 2.
- The EE+GS chip sends the video signal directly to the RSX.
- These chips are not accessible by developers, they are used for backwards compatibility only!

Backwards compatibility

Having mentioned the PS2 chips, I guess this is my cue to talk about backwards compatibility of the Playstation 3 once in for all.

First things first, let me introduce how backwards compatibility generally works: consoles can either play their predecessor’s games with the help of software (instructs existing hardware to behave as the old game would expect) and hardware (either the existing hardware provides total or partial backwards compatibility; and/or the company added extra chips to recreate the older system within the new motherboard). With the amount of processing power the PS3 shows, you would expect Sony to ship a PS2 emulator running within Cell and accelerated by RSX. Well, for some reason that didn’t happen and instead, Sony fitted the PS2’s chipset at one corner of the motherboard.

The big EE+GS chip, two 16 MB RDRAM modules and the ‘PS2 bridge’.

The same picture with important parts labelled.

On the other side, the missing but not-as-critical chips (IOP, SPU, etc) are replicated with software running in Cell. In the case of game saves, initially, users had to acquire a memory card adaptor, but once a new software update landed, Memory Cards are now emulated as disk images stored in the hard disk, while Magic Gate (the encryption system) is handled seamlessly by one SPU.

Since the Cell and RSX are still ‘on’ while playing a PS2 game, the system offers two scaling methods for increasing the screen area during gameplay, these are ‘nearest neighbour’ or ‘smoothed’ (anti-aliased).

PS3’s user interface showing the game entry after inserting a PS2 disc.
(Don’t worry about the other icons for now, as some are not even official).

All in all, thanks to this setup, the PS3 runs PS2 games at an impressive compatibility rate. On top of all, you can take advantage of new features that come with the new console (wireless control, HDMI interface, virtual memory cards).

As if wasn’t enough, PS1 games can run as well, this time without needing to embed the old SoC or GPU (it relies on pure software emulation).

The strange end of terms

Throughout the lifecycle of the PS3, Sony slowly trimmed PS2-only chips from the PS3 motherboard to the point backwards compatibility was solely software-emulated (with greater limitations, such as only running PS2 games purchased from their online store). Because Sony never replaced the PS2 chipset (like it previously did with the PS1 hardware inside the PS2), it makes you wonder about the technical and executive rationale behind this. Well, as a case study, here’s my quick opinion about the reasons for this:

Timing: Sony likely intended PS2 owners to buy their new product as a replacement of their current one, as this is more affordable for consumers (they can sell their old system). However, for some reason, Sony didn’t have a software emulator prepared before release day, so they initially resorted to adding extra chips. Later on, as the software emulation progressed satisfactorily, they slowly removed these in further motherboard revisions.
- To complement this, developer ‘M4j0r’ commented: “An interesting point might be that Sony developed the two hardware emulation revisions at the same time (EE/GS and GS only), I guess because some games run better depending on which you use.” [41].
Costs: The introductory price of the first revision of the console (CECHA, only in Japan and US) in 2006, which was PS2-compatible, was priced at $599.99 or ¥60,000 without taxes (£425 adjusted for 2020 inflation) [42]. The following model (CECHC, shipped in 2007 internationally) removed the Emotion Engine and RDRAM (shifting those tasks to software emulation) and launched in the UK with a £425 price tag (£603 in 2020 money). Later in the same year, Sony released a new model (CECHG) without any PS2-related chip for £126 less [43]. All this proves that backwards compatibility is, in the end, an expensive feature.
Idling hardware and wasted power: While Cell and RSX still take care of some tasks to recreate the original environment, these are minimal compared to their full potential. Combined with the fact CECHA models have a cumulative power consumption of 399 Watts [44], it does make you wonder if this design is worth the power consumption, let alone efficient (for comparison, CECHG’s new power supply consumes 285 Watts).
- I understand there are other factors involved in the reduction of power consumption, like the new revisions of Cell and RSX. However, I still believe the PS2’s chipset plays an important role.
Inflexibility: The EE+GS chip is not re-programmable, which means the end result will always be the same, independently whether there are glitches or possible enhancements. Compare this to the PCSX2 emulator’s graphic enhancements [45] and its modding capabilities [46], this show us that room for improvement is possible and appreciated.

Personally, I believe pure software emulation is the most feasible option in the long term due to its scalability, customisation, and independence from proprietary hardware. But of course, this takes more effort to implement accurately, as the ongoing development of PCSX2 by a volunteer-driven community demonstrates it (please note that the aforementioned emulator only runs on x86 PCs, however).

Lateral compatibility

We are not over talking about compatibility just yet! It may surprise you that Sony also allowed users to run a subset of Playstation Portable games as well. Though emulation was carried out completely with software, just like the PS2 compatibility in later models.

As there isn’t any UMD disc reader in the PS3, users must access a game catalogue from Sony’s online store to download and install any PSP game.

Operating System

Now that home consoles have become powerful multimedia hubs, a more convoluted operating system will be needed to provide users with more services and games with a thicker layer of abstraction. All of these, while keeping security and performance up to the task.

Consequently, terms like shell or BIOS are no longer used to describe this area, not because they don’t exist anymore but because they describe a small fraction of the new system. The generic term is now ‘operating system’, this comprises many areas (boot loader, kernel, user interface) analysed separately. As always, I recommend checking out the PSP’s OS first, since its modular design is a recurrent ingredient in the PS3.

Cell’s privileged security

Before we dive into details, I need to mention the different modes of operation in Cell. I originally planned to describe this in the ‘CPU’ section, but since that got incredibly dense, I’ll introduce it here where you’ll see its practical use right away. Furthermore, its modes also affect the design of any operating system running within Cell - not just the one Sony developed for this console.

Having said that, to safeguard against unauthorised access to sensitive data and/or resources, Cell implements a set of privilege levels inherited from the PowerPC specification. In other words, Cell executes programs in two modes:

Privileged mode: Cell grants access to every corner of its hardware (registers, memory addresses, opcodes, etc) [47]. For security reasons, this mode should only be used by the operating system’s core (i.e. the Kernel).
- Moreover, Cell was also prepared to run multiple operating systems concurrently and to achieve that in the hardware level, ‘privilege mode’ can be further divided into Privilege 1 and Privilege 2. ‘Privilege 2’ is meant to be used by a Kernel, while ‘Privilege 1’ is used by a Hypervisor, the latter arbitrates resources between different Kernels running at the same time.
- The ‘hypervisor’ functionality also turned into an area of research in IBM’s headquarters [48] [49].
User Mode: As the name indicates, Cell only grants a limited set of resources [50] and it’s directed to traditional applications running on top of the operating system. If for any reason, a program requests access to a protected location, then execution jumps to the Kernel or Hypervisor to request whether access should be granted or not.

Additionally, SPEs contain a modus operandi called isolated mode and it shields the execution process within the SPU so no external unit (the PPE or other SPEs) can access it until the SPU finishes. This can be activated after uploading a program to any SPE and ensures that the processor is not tampered with while the sensible code (i.e. an encryption routine) is being executed.

Sony’s operating system, which I’m going to describe in the following paragraphs, uses all the modes described to handle its security.

Overview

As I said before, the OS is quite complex. So, to be able to follow this section without too much trouble, we can divide the type of files we’ll find in the operating system of this console into different layers:

Loaders: to make a long story short, programs/binaries in this console are systematically encrypted. So, ‘Loaders’ are programs that execute ‘real’ programs. To put it in another way, Loaders grab binaries, decrypt them, check their authenticity and finally send them to the respective processor (the PPE or either SPE) for execution. If that doesn’t sound complicated, Loaders are chained together to protect the software even further. Finally, Loaders are found across many mediums.
- Some Loaders are updated by Sony (through software updates) while others can’t be changed. This is independent of whether they are installed in re-writable storage, as some loaders are encrypted using console-specific keys, so they can’t be altered after the console leaves the factory (at least through traditional means).
System files: these comprise low-level binaries (executed through Loaders), metadata for organising the hardware, utilities and other assets (i.e. fonts, imagery). Just like Loaders, there are console-specific system files that cannot be replaced or autogenerated.
- Some binaries borrowed code from the Free BSD and NetBSD projects [51].
User Content: These include configuration files (i.e internet settings), data used by games (i.e. game installation files and saves) and data generated automatically by the console (i.e. hard drive information).
- Unlike the other layers, destruction of this data does not lead to catastrophic outcomes.

OS' security hierarchy

Generally speaking, the PS3’s OS is designed with the same modular approach as the PSP. To recall the previous article, the OS is made of multiple modules. These may serve the user (like a game or app) or reside in memory indefinitely to serve other modules (in the form of system calls and/or drivers). Some modules have more privilege access than others (kernel module vs. user module).

Diagram showing how the components of the Playstation operating system fit in Cell’s privilege levels.
References to ‘OtherOS’ are further explained in the next sections.

The operating system, throughout its lifecycle, will call upon many modules, some of them will have more privilege than others. Sony constructed its OS so modules will run under Cell’s three privilege levels:

Level 1: here is where a Hypervisor programmed by Sony resides. Also referred to as lv1, this program is the door to every single bit of this console and chained to exceptions triggered by the MMU. That being said, the hypervisor only accepts requests by programs authorised by Sony (residing in the next privilege level). While the Hypervisor resides in memory, it also provides low-level system calls and FAT16 filesystem support.
Level 2: naturally reserved for the Kernel, a privileged program also named lv2 or ‘Supervisor’. The kernel abstracts the hypervisor so level 3 programs don’t have direct interaction with it. The Kernel provides multi-threading functions for both PPU and SPU. Ultimately, the Kernel bootstraps user-land modules.
Level 3: the rest of the programs (called user-land/userspace), including games and the visual shell, run at this level. These plebs are under the will of the Kernel to communicate with the console’s hardware and they cannot spawn any new process/program unilaterally.

Storage medium

With all being said, where is all this data stored? From the general user perspective, there are only two visible mediums: Blu-ray discs for games and a Hard Disk for saves. Well, there are a few more, so we are now going to take a look at every one of them!

It turns out, within Cell there’s a small ROM hidden somewhere that manufacturers may store a ‘protected’ boot-loader. IBM provides this space is to save any company (not just Sony) from having to manually implement obfuscation methods to protect their boot code, as off-the-shelf components are not always prepared for bespoken needs.

Since this piece is already physically protected with obfuscation, it doesn’t have to be encrypted. Thus, it’s ideal for a first-stage boot-loader (which can’t be encrypted) and the PlayStation 3 stores its early boot stage there.

Boot process

Alright, using all the previous knowledge, you are going to learn now how the system boots up - and let me tell you, it’s pretty complicated. The reason is simple: Sony doesn’t want you fiddling with their hardware or software, so they built many layers of obfuscation and encryption to prevent you from breaking in and side-loading your own code (and hopefully give up and keep buying games/movies/whatever) but, as history will tell you, the opposite happened.

In the following section, I’m going to describe what this console does once you push the power button. Note that this process only drastically changed once (after hackers cracked it). So, for simplicity purposes, we’ll start with the ‘original’ boot process (implemented before system version 3.60)[55][56][57]:

A separate chip in the motherboard (called Syscon) powers on and executes instructions from its internal ROM. It then sends a ‘Configuration Ring’ to Cell via SPI (a serial connection), this initialises Cell and deactivates the eighth SPU. Finally, it latches the power line and gives life to Cell.
Cell’s PPU reset vector points to its hidden ROM, which stores the routines to locate and decrypt bootldr from Flash. The decrypted piece is then loaded by the first SPU in isolation mode.
The now-isolated SPU, having loaded bootldr, initialises part of the hardware (XDR memory and I/O interfaces) and decrypts a binary named lv0 and instructs the PPU to run it.
The PPU, now executing lv0, decrypts metldr (a console-specific loader) and sends it to the third SPU, again in isolation mode.
The SPU2, now executing metldr, executes five more loaders sequentially:
1. lvl1dr decrypts and loads lv1, which contains the Hypervisor that takes over the first privilege level. Moreover, lv1 sets up the hard drive, Blu-ray drive and RSX.
2. lv2ldr decrypts and loads lv2, which contains the kernel and runs on top of the hypervisor. It also finishes initialising RSX, the PS2 emulation, Bluetooth, USB controller and the Multi-card reader.
3. appldr decrypts and loads vsh (the Visual Shell) and other dependencies. vsh will later enable the user to load a game.
4. isoldr decrypts and loads modules that will run in the third SPU in isolation module. These modules are critical for security and perform many cryptographic functions throughout the console’s lifecycle. Consequently, the third SPU is reserved for security functions and games can’t use it (leaving only six SPEs for games).

The PPU, having loaded vsh, grants the user control through a graphical user interface, which manifests itself with an iconic orchestral splash sound followed by the XMB menu.

Revisioned boot process

In March 2011, a hacker known by the name ‘GeoHot’ broke the security of metldr, thereby compromising the authenticity of subsequent loaders. Thus, Sony retaliated by issuing security updates in their hardware and software. These fixes are further discussed in the ‘Anti-piracy’ section of this article.

Visual Shell

Are you getting tired from all this theory? Let me switch to something everyone can actually see: The Visual Shell.

XrossMediaBar (XMB), a new user interface that gained international recognition two years before, has been slightly adapted so it can be interacted from the sofa (the so-called ‘10-foot user interface’) and expanded to take advantage of ‘full HD’ resolution (1920x1080 pixels).

XMB in the PSP (2004)
Rendered at 480×272 pixels

XMB in the PS3 (2006)
Rendered at 1920x1080 pixels

While PSP users will find many familiarities, Sony added a new set of apps that use the potential of Cell, RSX and the Blu-ray drive. Many of them related to multimedia (i.e. video player and image slideshow), television (such as on-demand TV apps, like BBC’s iPlayer), social profile (online avatars) and online purchases (Playstation Now and Playstation Store, to name a few).

Additionally, since this is a home console that might be shared by multiple members, XMB supports multiple users, where each one may use a different Playstation Network account and store separate user data (purchased games and saves).

Just like in the PSP, highlighting a game may style the background to get your attention!

The XMB provides an immense amount of settings, especially helpful when you need to setup your shiny new 1080p telly with 3.1 surround audio.

Various multimedia options

XMB can install games, updates and expansions (DLCs) using a native package installer.

Finally, the inclusion of a hard drive is a relief for the veterans that in the past were obliged to buy expensive proprietary storage (Memory Stick Pro Duo) whenever they ran out of space.

Lend me your PS3

Impressively enough, not every app bundled with this console had self-interest goals. With the advent of distributed computing and the capabilities of Cell for data science projects, Stanford University joined hands with Sony to enable Playstation 3 owners to contribute to medical research. The result was Folding@home (pronounced ‘folding at home’).

Folding@home was an application installed in every Playstation 3 that, once opened by the user, connected to a central server and ran protein simulations. Moreover, the app was also allowed to run in the background during off-peak times.

Folding@Home displaying the work accomplished since the user started the app [58].

Throughout its lifetime, the joined computing power of 15 million PS3 users worldwide assisted Folding@home with their research towards curing Alzheimer’s disease [59]. In the end, Folding@home and Sony retired the app in 2012 and the former lives on on other platforms.

This is my personal opinion, but I enjoy reading about projects that make global contributions using the capabilities of distributed computing, as opposed to the never-ending sensational articles wining about cryptocurrency mining. I guess we shouldn’t forget that, with every new powerful technology, there will always be selfless applications developed for it.

A multi-OS proposal

When IBM described Cell from the software level, they mentioned that Cell is capable of running multiple OS at the same time, due to Cell’s many execution cores [60]. Thus, Sony took this idea forward and added an option in XMB to install a secondary operating system [61]. This feature was called OtherOS and, in a nutshell, provides a partition manager (XMB just guides the user to resize GameOS' partition and allocate new space for the second OS) and a button to boot from the second OS (thanks to OtherOS' boot files already setup in Flash). So, the user just need to fill the new partition with an OS. Consequently, many Linux distributions (i.e. Ubuntu and Fedora) added the PS3 as another possible target to install it on. You may consider this a spiritual successor to Linux for PS2.

Red Ribbon GNU/Linux is a distribution exclusive for the PS3/Cell and compiled using the PPC64 target [62].

Thanks to OtherOS, experienced users had the opportunity to develop homebrew applications running on Cell without licensing restrictions, this was particularly interesting for research/scientific purposes [63] [64], as this console carried a more affordable price tag than a mainframe. For multimedia purposes, the Blu-ray drive and Multi-card reader were also accessible from OtherOS.

On the other side, while OtherOS' privileges may surpass GameOS' (at the kernel level), they don’t overtake the hypervisor, which still resides in memory. So, any hardware access from OtherOS still depends on the will of Sony’s hypervisor, and it so happens that the latter blocks access to RSX’s command buffers (preventing the use of the shader units, among other components used for accelerating graphics operations). Consequently, resulting Linux distribution resort to software rendering (all graphics are drawn by Cell) and then stream the frame-buffer to the RSX for display. While it’s disappointing that OtherOS can’t make use of the full capabilities of this console, this was probably done to reduce attack surfaces. Ironically, OtherOS' use of Cell is similar to how IBM/Toshiba/Sony may have originally envisioned the PS3!

Sharing the same fate of Folding@home, OtherOS was eventually removed in a subsequent update, but due to different causes (mainly related to security). Shortly afterwards, OtherOS was unofficially restored thanks to software exploits and reverse engineering efforts. At the moment, OtherOS is available if the user installs a custom firmware. I explain this further in the ‘Anti-piracy and homebrew’ section.

At the time of this writing, developer René Rebe is currently implementing proper xf86 drivers that take advantage of the acceleration provided by RSX and its 256 MB of memory [65]. His work is combined with other developments that removed the restrictions imposed by the hypervisor (initially thanks to the discovery of software exploits and later with the use of a custom firmware, the latter is explained further in the ‘Anti-piracy and homebrew’ section). Mr. Rebe publishes his progress on his Youtube channel and relies on voluntary donations to continue his work [66].

Updatability

For the final part of this long section, let’s talk about the updatability capabilities of GameOS.

In a nutshell, just like the PSP, Sony distributed PS3UPDAT.PUP files which packages all the new OS binaries. Due to the console’s security system, only those files which aren’t secured with unique console keys and are stored in re-writable storage (Flash, Hard drive, eMMC) are updatable, the rest must stay as it is.

PUP files were distributed through Sony’s official website, the XMB update assistant or found in the contents of a game disc (all games embed a PUP file, reflecting the SDK version developed for). Since models with NAND Flash contain only 256 MB of space and store the whole OS there, Sony never released update files with sizes higher than 256 MB.

Games

This section encompasses topics related to game development, distribution and services.

Development ecosystem

As this console amalgamates technology from various companies, including products already commercialised in other markets (i.e. Nvidia’s Geforce7 GPU line for PCs), developers were drowned with many different tools to develop their software. Note that this doesn’t imply development was easy, but it’s something to appreciate compared to the assembly days.

To program Cell, IBM and Sony shipped separate development suites, IBM ones targeted non-restrictive environments like Linux (and OtherOS), while Sony’s tools explicitly targeted the PS3’s GameOS as the only execution environment.

Having said that, IBM distributed the IBM Cell SDK package for free [67]. It includes the GCC toolchain modified to generate PPU and SPU binaries, allowing to development in C, C++, Fortran and assembly. It’s also cross-platform, enabling to compile code from other equipment (like an x86 PC). The SDK also included low-level libraries to facilitate SIMD mathematical operations and SPU-PPU management. Finally, it bundled a fork of Eclipse IDE.

To ease the complexity of Cell development, IBM also developed another short-lived compiler called XLCL that compiles OpenCL code (C/C++ variant for parallelised computations) for the PPU and SPU. Though this was only distributed through IBM’s Alphawork’s channel, meaning it remained experimental.

Now, what about Sony? Well, similarly to their PSP SDK, they shipped hardware devkits (many variants with different sizes and enhancements) and a software package made of compilers, libraries and debuggers that used Visual Studio 2008 (and later 2010) as an IDE [68]. Since they only supported the PS3, their SDK included the same GCC toolchains but complemented with tons of libraries to assist in graphics tasks, audio and I/O. In the case of graphics/RSX, Sony provided GCM to build raw commands and psGL, built on top of GCM, to provide an OpenGL ES API. To write shaders, Nvidia provided Cg, a shader compiler that parses a language similar to GLSL (the shader language defined by OpenGL).

License-free development

With the advent of native Homebrew (running on GameOS, not OtherOS), new open-source SDKs were created to bypass the dependency of Sony’s copyrighted libraries and therefore prevent copyright litigation. One example is PSL1GHT, an SDK used in conjunction with ps3toolchain [69], to provide a full development suite ready for developing legal Homebrew (though this requires a modified/hacked console with signature checks disabled).

Back in 2018, I built my own suite based on ps3toolchain but distributed in the form of a Docker container [70], so developers wouldn’t need to compile ps3toolchain and instead download my pre-compiled setup (saving many hours of compilation time). The container also bundled many tools like Nvidia’s Cg shader to mitigate dependency problems that I found while experimenting with PSL1GHT-based project. In the end, it was a fun experiment that helped me learn more about the development environment.

Outsourcing development

It’s worth pointing out that a peculiar business-to-business model raised in popularity during that time: Game Engines. Instead of spending time and money developing a game from the ground up, why not buy the codebase of other companies and build the game on top of that? This is what game studios like Epic Games envisioned [71]. Apart from selling popular game titles such as Unreal Tournament 3, the studio licensed a stripped-down version (without the assets) to other developers. This was packaged and named ‘Unreal Engine 3’. In a nutshell, game engines take care of all the fundamental areas (physics, lighting, etc) so developers only have to add their custom content (scripts, textures, models, sounds, etc.).

Game engine licensing is not a new business model, but due to the challenging environment of the PS3, they eventually became another attractive option for development.

Storage Medium

Now that we’ve finished talking about game development, it’s time for distribution. So, here I describe the official distribution mechanisms available for PS3 games.

Example of retail game

New generation = new medium. As the advantages of DVD start to wear down and its limitations, expressed by the game industry (space limit) and the film industry (480i format) [72], become apparent, it’s a matter of time before Sony unveils another standard to replace their new appliances. For this new console, the Blu-Ray disc was chosen.

The Blu-ray, as the name indicates, is a new optical disc format that delivered higher storage density thanks to the use of blue light diodes [73], as opposed to red diodes used with DVDs. Since blue light has a shorter wavelength than red light, more information (pits and lands) can be squashed together in the same space [74]. As a result, Blu-ray discs provide a surprisingly large capacity (between 25 GB and 50 GB!) using the same plastic disc with the same dimensions as the CD/DVD.

The Blu-ray data format responds to many needs by different industries: high definition film, digital rights management (DRM), region locking, a new file system and even a runtime environment for Java programs [75]. In the case of the video-game industry, retail games for Playstation 3 were distributed in 25 GB or 50 GB Blu-ray discs with copy protection. These are read by a 2x drive reaching speeds of up to 8.58 MB/sec [76], though the PS3’s laser can also read DVDs (at 8x speed) and CDs (at 24x speed) to play old games and movies.

While launch titles execute from the disc, later games copied part of their assets to the Hard Drive to increase reading speeds. Nevertheless, the game disc is always needed to kickstart the game.

Network service

Apart from the online store, there were many more online solutions added to the platform, including the debuting Playstation Network, a free online service competing directly with Microsoft’s paid-for Xbox Live.

Playstation Network enabled users to create a personal account and assign an avatar, then use that new digital persona to do multiplayer gaming, message and other social interactions. Users can also earn trophies if they complete a certain event in a game, and these then show up in the online profile (as in some form of medal of honour) to intimidate rivals and gain the respect of friends, I think.

Games offer an achievement catalogue to challenge their users.
The intent is to provide players with a sense of pride and accomplishment

Friends list
(Names are redacted for obvious reasons)

After doing some online gaming for a bit, random people started sending me messages

Last but not least, just like having an updatable operating system, games are updated too. Hence, upon launching a game, XMB may suggest downloading game updates (in the form of ‘packages’) that patch glitches and/or add new content. Updates install in the hard drive and work similarly to a layered file system.

Anti-Piracy and Homebrew

Everything you’ve just read has to be protected somehow against ‘unauthorised’ access. If you want to know an overview of how Sony carried out that, you are in for a treat.

Security foundation overview

Many parts of the console already provide security features that don’t require any manual implementation in software:

SysCon, the obscure proprietary chip (briefly mentioned in the boot process), controls the power lines of Cell, RSX and Southbridge. Its EEPROM contains records read by the operating system’s modules to determine which functions are enabled and which are not [80].
- Though I use the word ‘obscure’, SysCon is just a microcontroller, either an off-the-shelf ARM7TDMI-S (that’s right, the PS3 shares some of its DNA with the Game Boy Advance and even late PS2 revisions) enhanced with MagicGate support, or a custom NEC 78K0R variant [81]. SysCon’s internal firmware is what intrigues the most.
- SysCon and Cell communicate to each other using a serial interface (SPI) which plugs to Cell’s TEST component [82]. TEST provides many debugging functions on Cell, although SysCon only connects to the ‘Pervasive logic’ port, enabling SysCon to manage areas like power or thermal [83].
Cell houses a hidden ROM that store unencrypted boot routines without worrying about snoopers.
Cell’s privilege modes and SPE’s isolated mode prevents programs from accessing unauthorised resources.
The Southbridge seamlessly encrypts the hard drive’s content using AES.
The Blu-ray subsystem is another walled fortress, and its disc content is encrypted using a key found in the ‘ROM mark’ area of the disc (inaccessible by conventional readers) [84].

On top of this, Sony implemented the following protections in software:

A complex Chain of trust that starts with Cell’s unencrypted boot ROM and ends with a graphical user interface (XMB) that only loads encrypted binaries (by Sony) under a kernel and hypervisor.
- The chain of trust implements multiple encryption algorithms, including asymmetric ones like RSA and ECDSA and symmetric systems like AES; combined with HMAC and SHA-1 (to confirm the integrity of data).
Some encryption keys are produced during manufacturing, meaning if hackers find and leak these keys, they will not work on other consoles. Though this comes at a cost of Sony not being able to patch software encrypted with those keys once the console leaves the factory.
- These special keys are used for bootldr and metldr (the early boot stages).
Games must call the kernel to access the hardware, which in turn asks the Hypervisor. This ‘abstraction onion’ prevents game exploits from escalating privileges, in theory.

Defeat

You’ve seen how much this console is capable of, did you expect hackers to settle with the limited features of OtherOS? I guess Sony didn’t either, the company tried hard to protect some areas, while leaving others half-closed, as hackers will later demonstrate.

Let’s take a look at how some of its strongholds were cracked by independent hackers across the world, bear in mind the PS3 hacking community was very active, with many tools and documentation produced every year. So, I’m going to focus on a few milestones that pave the way to an influx of content and homebrew development, but you find more info at PS3History [85].

In 2010, after three quiet years in the hacking scene, the community took a turn for the best. George Hotz, a hacker known for previously unlocking the first iPhone model (a.k.a the ‘2G’) so it can work with any network (originally only on Cingular/AT&T), managed to read and write protected areas in memory without being stopped by the Hypervisor. He then published his exploit along with a short summary in his blog [86].

The exploit requires two materials: A Linux installation running under OtherOS (for arbitrary, yet limited, code execution); and an external glitcher connected to the XDR bus (interfacing main RAM). To make a long story short, the hypervisor uses a hash table stored in main RAM to catalogue memory addresses along with their privilege levels, so user programs can’t access protected memory spaces. The attack works by breaking the integrity of such table to be able to write over it, and then use that privilege to modify the entries to grant the current program access to every corner in memory.

In summary, Hotz discovered that under Linux/OtherOS, programs can request the Hypervisor many blocks of memory pointing to the same physical address, but if the program deallocates them while there’s an external interference in the XDR bus (due to a glitcher sending electrical pulses), the deallocation process ends half-done [87]. As a consequence, the hypervisor’s hash table (residing in RAM) still contains an entry of the allocated addresses, but at the same time, it thinks that that space has been freed. Hotz exploit then proceeds to request more blocks so the Hypervisor extends its table with more entries, and the process continues until an entry of the hash table overlaps the memory location of the block that was supposed to deallocate. Since the hash table kept the old entry granting the user access to that address, the hypervisor ends up giving the user access to modify a hash table entry! Thus, the exploit amends the entry to extend access to all memory space.

While this exploit required a Linux-running-under-OtherOS environment, it was a huge step towards further reverse-engineering and research projects, since hackers were now able to investigate critical areas of the system that were originally unaccessible. It’s worth mentioning that, during the same time, Sony released software update 3.21 removing OtherOS. You would think that this would deter hackers from continuing their work, but it just gave them more reasons to speed it up.

The Custom Firmware (CFW) era

Cracking metldr meant everyone was now able to create ‘official’ systems for the PS3, this resulted in an influx of GameOS ‘flavours’ which different communities produced with various customisations. These systems were modifications of Sony’s official firmware files (that Sony distributed as updates) and re-packaged using Sony’s leaked keys, so they could be installed anywhere. The result was called Custom Firmware (CFW) and became the de-facto method for hacking this console, that is until Sony responded with tough measures.

My CFW installation with ‘VSH menu’ opened. This variant (called ‘Rebug’) also enabled me to turn my console into a debugging station (notice the IP address at the bottom right corner, you need to enter it on the debugger to attach to a running process) and fiddle with my own homebrew.

In the meantime, many CFW appeared on the net with many names (i.e. ‘Rebug’, ‘Ferrox’, etc) and they contained customisations such as [95]:

Disabling signature verifications on any module installed or to be installed.
Enabling to read and write (the classic peek and poke) over any memory address, either using the Hypervisor (level 1) or the Kernel (level 2).
Activating hidden debug functions to install modules packaged as ‘pkg’ files. These didn’t need to be signed with Sony’s keys to work inside a CFW environment.
Enabling to mount disc image as there was a Blu-ray disc inserted.
Restoring OtherOS and even enhance it by removing the restrictions imposed by the Hypervisor. The result was called OtherOS++.
Writing over Syscon EEPROM’s database to enable to install any system version of choice. This is also known as QA Toggling.
Altering the style of XMB (i.e. removing the epilepsy warning, allow to take screenshots in-game, etc).

There’s also my favourite one: bring the debugging functions of a testkit, allowing any retail console to become a debugging station. This could be done either by installing a CFW with debugging capabilities, or a CFW that could convert the retail console (called ‘CEX’) into a debugging model (called ‘DEX’) by altering console-specific data in Flash memory.

Sony’s strong response

Similarly to the events that happened after CFWs were invented for the PSP, Sony retaliated with two security updates:

From the software side, Sony shipped two system updates that enhanced the security system:

With 3.56, binaries are signed with new encryption keys resilient to the previous ECSDA discovery [96], thus, CFW creators can’t customise the new binaries (since they don’t have the private keys to re-encrypt them). Furthermore, a new revision of the ‘system updater’ application is also shipped, this enforces the new certificates in system update files (PS3UPDAT.PUP), meaning that even if hackers manage to package a new CFW, only consoles with system version 3.55 or lower will be able to install it [97].
Later on, system update 3.60 revamped the boot process, it nullified metldr and promoted lv0 to take over in bootstraping the loaders (lvl1dr, lv2ldr, appldr and isoldr). All in all, this meant hackers could not modify the new system files without first cracking lv0 (finding its private key).
- This eventually happened in late 2012, when a team called “The Three Musketeers” published the lv0 keys [98], which paved the way to new CFWs made from system versions newer than 3.55. Although, due to the aforementioned changes in the updater, only users on system version 3.55 or lower (including any CFW with signature checks disabled) can install it.

From the hardware side, not only subsequent PS3 models (late CECH-25xxx, CECH-3xxx and CECH-4xxx) came pre-installed with a system version higher than 3.55, but they also contain a different variant of bootldr/lv0ldr (called lv0ldr.1) that not only decrypts and loads lv0 but it also fetches a new system file called lv0.2. The latter contains metadata about lv0 [99] to ensure that lv0 hasn’t been tampered with. lv0.2 is signed with a new key (also invulnerable to the previous ECDSA discovery), thus, preventing hackers from taking control of the boot chain.

To this day, these models are not able to run a CFW, thus nicknamed unhackables. Though they can run a ‘Hybrid Firmware’ (HFW) and we’ll discuss more about it later on.

As time went by, the number of CFW-compatible consoles only decreased, thus, PS3s that weren’t updated past 3.55 became some sort of relics. In the meantime, there was a surge in demand for alternatives, like downgraders (to revert to system version 3.55 on old models) and ODEs (to play pirated games on new models).

Homebrew revival

After a long waiting period for users that missed the window to install a CFW, in late 2017, a team of hackers released PS3Xploit, a collection of exploits and utilities [100] that brought back the ability to install CFW on old models without needing an expensive downgrader (and skills to operate it).

PS3Xploit’s main payload replicates the job of a hardware downgrader (patching CoreOS files) entirely by software, it works as follows:

The starting point is the XMB’s internet browser, built on top of Webkit. PS3Xploit uses Javascript to gain arbitrary code execution within the system’s userspace (and outside Javascript’s environment). To kickstart this, users only have to open XMB’s native web browser, enter an URL pointing to the PS3Xploit’s host an let it do its job.
It so happens the kernel provides system calls that can be used to overwrite the operating system’s files in Flash memory. On top of this, the Visual Shell (XMB) and its plugins store routines in memory that make use of those calls.
PS3Xploit can’t trigger those system calls directly due to the Hypervisor’s ‘no-execute’ protection, preventing the exploit from loading new code in userland. However, it can find a way to overwrite Flash memory by ‘borrowing’ Visual Shell’s routines.
Consequently, PS3Xploit proceeds to modify Webkit’s execution stack to redirect execution to Visual Shell’s routines. This type of technique (corrupting the stack to deviate execution to other code residing in memory) is called Return Oriented Programming (ROP) and it’s very popular in the InfoSec genre. One way of mitigating this is by implementing Address space layout randomisation (ASLR), which makes it difficult to guess the location of the routines (called gadgets) but, as you can guess, Sony’s hypervisor lacks of ASLR.
Finally, those system calls are triggered with PS3Xploit’s parameters and so they replace CoreOS files (the first part of the operating system, stored in Flash memory) with patched ones [101].
The console is now able to install unofficial software updates, an opportunity the user can now exploit to install a custom firmware. However, it can’t downgrade the system version, yet, but once an up-to-date CFW is installed, the user can install further utilities to downgrade the system and install a better-equipped CFW, if so wants.

As you can see, this gift from the sky brought custom firmwares back into the spotlight and rendered hardware downgraders and ODEs obsolete. On the other side, for those units which couldn’t install a CFW either way (the unhackables), the team later offered PS3Hen, a different exploit package that focused on enabling a subset of CFW functions (including the ability to execute homebrew). This one installs itself as an entry in XMB and the user must run it every time they power on their console to re-enable the execution of homebrew apps.

Sony’s partial response

As luck would have it, Sony only took small steps to block PS3Xploit (maybe because this turn of events happened years after the PS3’s successor, the Playstation 4, hit the stores). They released a few system updates that didn’t fix this chain of exploits but removed the routine used in Webkit for bootstrapping the chain. In response, hackers published slightly modified software updates that restored such entry (and somehow, they didn’t need to be re-signed) [102]. These custom updates were called Hybrid firmware (HFW) and at the time of this writing, they are the de-facto option used to enable homebrew on unhackable systems.

And here concludes the anti-piracy/homebrew saga. In my humble opinion, I don’t think Sony is interested in putting more effort into this console. So I wouldn’t expect any more cat-and-mouse games in this field.

That’s all folks

Two PS3s, one playing that game which-shall-not-be-named.
Aside from the buggy gameplay and unusual storyline, I quite enjoy the genre.

You made it to the end!

To be fair, I originally planned this to be a two-month project, but it turned into a whole summer one (and you’ve seen why). In any case, I hope this helped you expand your knowledge about this system and enabled you to understand the reasonings behind the technological progress during that era. This way, you can now think beyond the popular hearsay constantly recycled by the masses.

If you wonder, for this writing, I’ve used three PS3 models:

An unhackable CECH-3001 one from my teenage years (for some reason the box says it’s a CECH-25XX model!). It’s been recently taken out of the attic to try out PS3Hen.
A CECH-2100 that I bought after PS3Xploit came out, I was finally able to install homebrew.
A CECHA model (only released in Japan) that I acquired in August 2021 to gather material for this article (mostly photography and info about PS2 compatibility). It was quite pricey, luckily the donations of supporters helped me offset the costs.

While I repeated many times how Cell was groundbreaking technology, you may have noticed that I didn’t mention how unreliable the early models ended up being. It was the first time I heard about a console malfunctioning by just playing a game for a while. Indeed, these things ate electricity like dinosaurs and heat up like an oven (a plastic oven). Luckily, I had a slim model (should’ve been named ‘the-working-one’)… What a rushed era, huh?

Anyway, as for what’s next on my agenda, I’ll be taking some time off myself before starting the next article, so I can work on other areas to improve the website and catch up on personal stuff.

Until next time!
Rodrigo

Supporting imagery

A quick introduction

On the article’s length

Introduction

A glance at Cell

Overall structure

How this study is organised

Inside Cell: The heart

Inside Cell: The leader

Composition of the PPE

The PowerPC Processing Unit

PPU’s building blocks

Wrapping the PPE up

Outside Cell: Main Memory

Inside Cell: The assistants

Composition of the SPE

Architecture of the SPU

Inside Cell: Programming styles

PPE-centric approaches

SPE-centric approaches

Conclusion

Graphics

Overview

Organising the content

Constructing the frame

A unified Video Output

‘Real’ 3D vision/projection

Audio

I/O and backwards compatibility

External interfaces

‘Less wire’ equipment

Internal interfaces

Backwards compatibility

The strange end of terms

Lateral compatibility

Operating System

Cell’s privileged security

Overview

OS' security hierarchy

Storage medium

Boot process

Revisioned boot process

Visual Shell

Lend me your PS3

A multi-OS proposal

Updatability

Games

Development ecosystem

License-free development

Outsourcing development

Storage Medium

Network service

Anti-Piracy and Homebrew

Security foundation overview

Defeat

The Custom Firmware (CFW) era

Sony’s strong response

Homebrew revival

Sony’s partial response

That’s all folks

Recommend

About Joyk