3

Xbox 360 Architecture

 1 year ago
source link: https://www.copetti.org/writings/consoles/xbox-360/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Supporting imagery

Image
The original Xbox 360 with an external HDD plugged at the top.
Released on 22/11/2005 in America, 02/12/2005 in Europe and 10/12/2005 in JapanThe 'new' Xbox 360 (a.k.a. 'Slim' or 'S').
Released on 18/06/2010 in America, 24/06/2010 in Japan and 16/07/2010 in EuropeAnother 'new' Xbox 360 (a.k.a. 'E').
Released on 10/06/2013 in America, 20/06/2013 in Europe and similar in Japan
Main architecture diagram

A quick introduction

Released a year before its main competitor, the Xbox 360 was already claiming technological superiority against the yet-to-be-seen Playstation 3. But while the Xbox 360 might be the flagship of the 7th generation, it will need to fight strongly once Nintendo and Sony take up retail space.

This new entry of the console architecture series will give you an additional perspective of how technology was envisioned during the early naughties, with emphasis on the emerging ‘multi-core’ processor and unorthodox symbiosis between components, all of which enabled engineers to tackle unsolvable challenges with cost-effective solutions.

Highly recommended pre-reading

The first section of this writing (titled ‘CPU’) makes recurring references to IBM and Sony’s joint venture, Cell. If you don’t know about that component, I highly suggest that you read my previous article about the PS3 beforehand. The latter is written as if the Xbox 360 never existed, this was intentional to avoid confusing the reader with circular references. Hence, the new article completes the timeline by adding Xbox 360 into the mix.

For those who are already familiar with the PS3’s architecture, I’ve structured this study similar to the PS3 article, so you’ll be able to compare both side-by-side objectively if you choose to do so.

Personal video introduction

Because this article is way longer than my previous ones, I also made a quick video where I (attempt to) present this article and give you an idea of the content you will find. You can watch it here. It’s my first ‘personal introduction’ so you’ll have to forgive my occasional fillers :)


Once again, we find ourselves with a new type of CPU that is unlike anything seen on the store shelves. After all, this is another 7th-generation console that reflects an obsessive need for innovation, a peculiar trait of that era.

Introduction

Before we discuss the architecture, I’ll start with a bit of history to bring you up to speed. The following paragraphs focus on the business aspect of the Xbox 360’s CPU, whose sequence of events you may find amusing, to say the least.

I’ll try to keep it short so we can focus on the main topics of this series, but if in the end, you are interested in more, you may enjoy a book called ‘The Race For A New Game Machine’ which is written by the former executives at IBM.

Image
The beginning of Microsoft’s presentation at E3 2005, where Robbie Bach, J Allard and Peter Moore unveiled the Xbox 360 [1].

After enjoying the surprising success of the original Xbox, it was time for Microsoft to work on the successor. The company started by looking for vendors that could face off against Sony’s upcoming technology, however, unlike their previous development, Microsoft now enjoyed the upper hand in forthcoming negotiations.

To put things in context, back when the original Xbox project was still at an early stage, neither Intel nor Nvidia were willing to share their intellectual property with Microsoft. This decision limited Microsoft’s capacity to mould Nvidia’s or Intel’s chips for the specific needs of the Xbox.

For instance, the security subsystem that protected the console against the execution of unauthorised code was implemented outside these two critical chips. This made it vulnerable to snooping attacks that eventually paved way for the execution of Homebrew and piracy. Moreover, Microsoft did not control the manufacturing stage either, so the production of Xbox consoles was at the mercy of Intel’s and Nvidia’s supply.

Well, now that Microsoft gained more leverage in the console market, they weren’t willing to give away those rights anymore.

The new CPU partner

In a turn of events, IBM agreed to share its IP and to design a new multi-core processor, and so the Xbox 360’s CPU supplier became IBM. Although, you may remember this was the same IBM that already signed an agreement with Sony and Toshiba to produce the Playstation’s 3 CPU (‘Cell’). Apparently, IBM assumed Microsoft was not aware of the Cell project [3] and their current contract with Sony didn’t forbid them to sell to third parties.

All three companies [IBM, Toshiba and Sony]… legally all had rights to go and put any of that technology, any of those processor cores into other spaces. (…) It is very common to develop an interesting, leading-edge new technology and then utilize that technology across multiple platforms. (…) I guess what everyone didn’t anticipate was – before we even got done with the Cell chip and PS3 product – that we’d be showing this off specifically to a competitor. [4]

– David Shippy, chief architect of the Power Processing Unit (PPU)

Ironically, as of 2022, IBM’s PowerPC chips have vanished from both computers and video consoles, maybe this set a bad precedent and greatly affected IBM’s trust in future businesses? I’m afraid I don’t know the answer to that.

To sum it up, IBM signed an agreement with Sony and Toshiba to develop Cell in 2001. Two years later, in 2003, IBM agreed to supply Microsoft with a new low-powered multi-core CPU. Microsoft’s CPU will be called Xenon and will inherit part of Cell’s technology, with extra input from Microsoft (focusing on multi-core homogenous computing and bespoke security). Also, while IBM would brand Cell with its ‘BladeCenter’ line of servers, Xenon could only be fitted on an Xbox 360 motherboard.

Cell’s half-sibling

Now that we positioned Microsoft and IBM in the map, let’s talk about the new CPU. This is how Xenon materialised at the end of the Xbox 360 project:

Image

Simplified view of Xenon.A diagram of Cell for comparison purposes.
Cell also includes 32 KB of ROM not shown here.

Don’t worry, all of these components will be explained throughout this article, starting with the ‘PPE’ blocks shown at the top left corner.

A new look at CPU history

The Cell project, with its obsession with vectorised computations, introduced very interesting proposals to radically tackle the ongoing constraints that hindered technological progress. These constraints, however, are very complex. So it wouldn’t be fair to say that Cell’s methods were the only -or even best- solution. In other words, by studying Xenon’s architecture (which competed side-by-side with Cell’s) we can gain a different perspective on how CPU architecture evolved throughout that era (the early 00s) and subsequently influenced the next decade of CPUs.

Image
Xenon chip, surrounded by an army of decoupling capacitors.

By doing so, you’ll perceive that Xenon takes a more conservative approach than Cell. If we take another look at the previous diagram of Xenon, you can notice that the latter is equipped with the famous PowerPC Processing Elements (PPEs), which is also the most important piece of Cell. However, Xenon’s is now equiped with three of them. Additionally, the Synergic Processors Units (SPUs) are no more.

After all, Microsoft didn’t want processors of very different nature squashed in their CPU. They instructed IBM to compose three powerful cores and enhance them with the ingredients game developers would expect to find. With this approach, IBM and Microsoft were also able to add non-standard features without disrupting the traditional modus operandi of developers.

Truth to be told, this also resulted in aggressive budget cuts to keep this design (and the rest of the system) at a competitive price range. To put it in context, multi-core CPUs for PCs weren’t on the store shelves while IBM was building Xenon, and when they debuted in 2005 (coincidentally, the same year the Xbox 360 reached the stores), AMD priced their cheapest Athlon X2 at $537 (equivalent to ~£452 in 2021 money) and Intel charged $241 (equivalent to ~£203 in 2021 money) for their low-end Pentium D [5] - and let’s not forget the box only included the CPU.

How this study is organised

We’ll now take a look main components that comprise Sony’s counterpart. To avoid repeating existing information, I’ll focus on the novelties of Xenon.

Having said that, the new CPU runs at 3.2 GHz and it includes so much circuitry that, for this study, we have to split it into different groups:

  • The three leaders that execute the program’s instructions. At first, each resembles Cell’s PowerPC Processing Element (PPE), but you’ll soon see that they are actually a superset of it. Additionally, since we’ve got three of them now, it may seem as if the whole chip behaves like a Ceberus monster, where each core may claim control of the whole system. Alas, that’s not feasible in a computer, so the first core is the designated master core while the others will be taking assistant roles.
  • A single interface that interconnects the cores with the rest of the system. This bus is called XBAR (pronounced ‘Crossbar’).
    • Like in Cell, there are other proprietary interfaces used for debugging or maintenance (i.e. temperature) but these will not be mentioned until we reach the ‘Anti-piracy’ section.
  • The security block which Microsoft oversaw to implement the anti-piracy system. It’s a very complex section, so to avoid overwhelming you with information, I’ll explain it in the ‘Anti-piracy’ section as well.

The different approach for Xenon

To explain the aforementioned groups, I’ve organised the study of Xenon into these areas, in that order:

  1. The bus connecting all the cores, the XBAR and its special L2 cache block.
  2. The new refinements of the PowerPC Processing Element (PPE).
  3. The unusual abundance of general-purpose memory.
  4. The new programming model suggested (and, in some ways, enforced) by Microsoft.

Inside Xenon: The messenger

The original chip (Cell) was required to house twelve independent nodes actively moving data around, this forced IBM engineers to devise a complicated system that could tackle emerging bottlenecks, which materialised in the form of the Element Interconnect Bus (EIB). With the Xbox 360, Xenon only accommodates three units (the three PPEs), so the EIB has no purpose here. Thus, a simpler solution called XBAR was produced to solely focus on the three PPEs, with space for extra functionality.

XBAR relies on a mesh topology that doesn’t direct traffic in a token-style manner. Instead, each node is provided with a dedicated lane to move its data through [6]. This may appear more optimal than the token topology of the EIB, but that’s because the XBAR only needs to serve a small number of nodes. Furthermore, the XBAR operates at full speed (3.2 GHz).

Image

Simplified diagram of the XBAR/Crossbar combined with the L2 component.The architecture of Cell’s EIB (found in the PS3) for comparison purposes.

To be fair, until now I only talked about the particular interface that interconnects the PPEs. Well, the XBAR is just one piece of the sizeable chunk IBM designed for Xenon. It turns out the leftover space gave them room to incorporate another very important block that speeds up the transactions between the PPEs and the rest of the system: L2 Cache.

Shared cache

Once the PPEs pass through the XBAR, they get access to 1 MB worth of L2 cache. The L2 block is also connected in a meshed manner, where each PPE gets a separate 256-bit bus but is now clocked at 1.6 GHz (half the PPEs’ speed) [7].

Coincidentally, Cell also houses 512 KB of L2 cache for its single PPE. I only described it in one sentence in my previous article. Now, however, we find ourselves with a larger space that’s also shared between three cores, so I find it necessary to dive into the properties of the L2 block so we can understand how it will condition the performance of the PPEs.

First things first, every time the cache fetches data from memory (in the event of a ‘cache miss’), it does so by pulling a large slice called ‘cache line’, this is 128 Bytes wide in the case of Cell and Xenon. Then, L2 records the cache line on an internal list for locating it in the future. Moreover, in Xenon/Cell, L2 is 8-way associative, which means that cache sets may store up to eight different cache lines. Don’t worry if you don’t know what this means, the theory behind CPU cache can be hard to follow, especially if you only want to learn about videogame consoles. In layman’s terms, the greater the number of associations, the fewer probabilities of cache misses, but the slower it becomes to iterate through the internal list.

Image

Layout of cache in Xenon

The choice of an 8-way associative cache was not a rash decision for Xenon, as providing eight associations can alleviate the six simultaneous threads (each PPE is dual-threaded) trying to access the L2 block at the same time. This also balances frequent cache misses and lookup times. All of this, while keeping the costs down. For comparison purposes, the expensive Intel’s ‘Smithfield’ (a Pentium D from 2005) provides two cores with 2 MB of L2 cache each! [8]

It wouldn’t be the first time Microsoft trimmed the cache for pricing purposes, so it will be up to developers to optimise its usage. To assist with this, XBAR bundles additional logic like ‘Cache locking’ which will be further explained in the ‘Graphics’ section.

Inside Xenon: The leader(s)

This is as far as we go with our description of the XBAR and L2 cache, let’s now talk about the actual CPU. Just a reminder that, since I’ve already dissected the standard PPE (found in Cell) for the PS3 article, I’ll be focusing on the novelties here.

To start with, Xenon’s PPEs don’t feature a PowerPC Processor Storage Subsystem (PPSS) anymore, presumably since the interfacing part is handled by the XBAR and the L2 cache is now shared across the three units.

Image

Simplified diagram of Xenon’s PowerPC Processing Element (PPE)For comparison purposes, this is Cell’s counterpart

To be honest, I’m not sure why technical manuals keep calling Xenon’s PPEs a ‘PPE’ as they better resemble a PPU.

Anyway, let’s go over the significant changes that Microsoft liked to brag about when being compared with the Playstation 3.

The new vector units

As the Synergistic Processor Elements (SPEs) were Sony’s secret weapon and Microsoft was only interested in homogenous systems, IBM presented an alternative approach to speed up the manipulation of vectors and matrices in Xenon [9]. In a nutshell, IBM boosted the VMX unit (the SIMD block found within the PPEs) which evolved into VMX128, now housing more registers and opcodes.

The initial VMX specification implemented in the Playstation 3 provides 32 128-bit registers and instructions for operating up to three 32-bit scalars. This worked to an acceptable degree with general-purpose applications that depend on SIMD operations, although the real performance would be unlocked once the SPEs are added into the equation. By contrast, Microsoft wanted programmers to port SIMD-hungry applications without extra hassle, and the new VMX128 unit reflects that.

VMX128 supplies 128 128-bit registers instead, along with an adapted instruction set to manipulate the larger set of registers. To accomplish that, IBM changed the opcode format to allocate 7 bits (as opposed to 5 bits) for referencing its extended register file [10]. This is possible thanks to some trickery applied on the last five bits at the end of the 32-bit opcode [11]. The last five bits are mostly, but not completely, unused by VMX. Consequently, VMX128 is incompatible with a subset of VMX instructions (related to integer multiplication and additions).

Furthermore, VMX128 adds new instructions that compute the dot product of two vectors composed of up to three 32-bit floating-point numbers; and others that handle Direct3D’s data compression formats [12] (it’s worth mentioning that DirectX is the de-facto API for programming this console). Finally, thanks to the symmetric design of Xenon and its multi-threaded model (dual-issuing), VMX128’s register file is duplicated, so there are 256 128-bit registers per core!

As we reach the end of this section, there’s still one question left unanswered: which is faster for vector operations, 1 VMX + 6 SPEs (as in the PS3) or 3 VMX128 units (as in the Xbox 360)? Well, their designs are too divergent, so it’s hard to quantify. One could say ‘The SPE can execute up to two instructions per cycle while Xenon takes 12 cycles to add two vectors (due to the long pipeline of the PPE)’ but that’s relative, as the SPE’s memory scope is restricted to its local memory (requiring DMA calls to interact with the outside), while Xenon’s PPEs can access any memory location. So, in conclusion, these are two contrasting models and programmers will just have to get the best out of them.

A new but short-lived instruction

With the advent of a larger cache system, Microsoft and IBM extended the PowerPC instruction set to accommodate some instructions that operate these blocks from the program side, in case programmers wish to make manual interventions (such as caching data ahead of time or saving L2 space). In any case, the standard PowerPC specification already provides the dcbt (‘Data Cache Block Touch’) instruction to fetch data from memory into the L1 cache (individual for each core) [13]. Nonetheless, Microsoft took a step forward and added xdcbt (‘Extended Data Cache Block Touch’) to fill the L1 cache without even going through L2 (shared across all cores) [14]. This makes sense as the shared L2 is a new addition of Xenon. However, bypassing L2 is a risky operation as two cores may end up seeing different data (cache incoherency), so it will require correct handling from the program side to keep cache incoherency and race conditions out of the way.

As luck would have it, xdcbt worked fine until Microsoft started receiving error reports from game studios [15]. Initially, Microsoft relied on this instruction for common routines available through their API (such as memcpy()) but they didn’t account for the consequent lack of cache coherence. Thus, uses of xdcbt were subsequently removed from their APIs. Yet, studios found a bigger problem: branching instructions followed up by xdcbt always execute the latter. It turns out the branch predictor will attempt to execute xdcbt (as any other instruction that may be undone later on, if the prediction turns out to be incorrect) with the exception that once the cache blocks are mangled, there’s no way to undo it (becoming a hazard). Thus, the program will not synchronise the caches as it assumes xdcbt hasn’t been triggered, leaving the PPEs in a disarrayed state.

In the end, Microsoft purged xdcbt from the compiler due to its non-deterministic (thus unpredictable) behaviour, and what’s left of it was just an anecdote.

Revisiting old paradigms

There’s a recurring subject found in noteworthy writings from 2005 like ‘Inside the Xbox 360’ by Jon Stokes [16] or ‘Understanding the Cell Microprocessor’ by Anand Lal Shimpi [17], and that is the lack of out-of-order execution that once debuted in early PowerPC chips like Gekko, but for some reason is completely absent in Cell & Xenon. If you recall from the Gamecube article (which I wrote it two years ago), Gekko is an out-of-order CPU, meaning it’s able to analyse the instruction stream as instructions come in, and subsequently re-order them to better distribute the load of Gekko’s internal units.

By then, CPU cores employing out-of-order execution were in the order of the day (pun intended). IBM’s PowerPC 604 (1994) brought it to high-end Macintosh computers, Intel’s P6 (1995) introduced it to the x86 line and MIPS implemented it with the R10000 (1996) CPU, a successor of the R4000 (found on the Nintendo 64). Afterwards, all of a sudden, Cell and Xenon arrive with an in-order execution style… care to explain?

To understand this turn of events, let’s take a closer look at the Out-of-order execution (OoO) model. OoO is not a Lego piece that engineers fit and then move on to the next part. Pipelined CPUs are susceptible to a whole new range of hazards, which means that the slightest addition may lead to a complete redesign. For instance, OoO requires altering the register file to fit register renaming [18], a clever technique where the CPU houses more registers than the instruction set references, enabling the CPU to store multiple copies of the program state and prevent dependency problems that arise after altering the original order of the instructions. All in all, while these solutions are interesting, they add up to complexity to the CPU chip, and that can pose a threat to future scalability.

The PowerPC Processor Element (PPE) derives from POWER4 which is itself an OoO core. However, POWER4 is an independent chip by itself, while the PPE is either surrounded by 7 SPEs (as with the Playstation 3/Cell) or by two more PPEs (as with the Xbox 360/Xenon). These changes take up space and the dimensions of the CPU chip are critical in terms of price and heat emission. So, in the end, I imagine IBM didn’t see enough advantages for applying OoO on an application-specific machine like the PS3 or Xbox 360.

An alternative solution

Look at it this way, why was OoO invented in the first place? To prevent the CPU from idling (as idling = wasted resources). So, will the applications (3D gaming) be prone to this issue? Maybe, but it’s also true that OoO is not the only possible solution to this problem! Introducing Thread level parallelism (TLP).

What if instead of sorting out which instructions should be executed first, we duplicate the resources into two (or more) groups called ‘threads’ and let the program switch between the different groups of resources (multi-threading) as it sees more fit. This is what IBM’s engineers went for, hence the reason the PPE is dual-issued. The new technique, called Thread level parallelism (TLP), differentiates itself from Instruction level parallelism (ILP) by letting the program, as opposed to the CPU, to come up with its own solution. With Xenon and Cell, it’s now left to the compiler and the program’s multi-threading implementation to produce an efficient sequence of instructions.

The interesting thing is that neither approach is better or worse, out-of-order processors are still found in the market (Intel/AMD still support OoO along with an obscene amount of other techniques, while ARM adopted OoO with the Cortex-A8 in 2005). On top of that, those CPUs have multi-threaded cores and even bundle multiple cores within the same chip, so you get a mix of both techniques (TLP and ILP).

Inside Xenon: Main Memory

Unlike its competitor equipped with 256 MB of XDR DRAM, there’s 0 MB of external memory installed next to Xenon. That’s right, a 100% decrease compared to the Japanese counterpart.

Ok… let me rephrase that. On the motherboard, multiple chips provide a total of 512 MB of GDDR3 SDRAM, but they are sitting next to the GPU, not the CPU. So… you probably guessed it. Just like the original Xbox, Microsoft went for the Unified Memory Architecture (UMA) layout, where all components share the same RAM chips. This provides more flexibility in the amount of memory reserved for the CPU or GPU. However, a single point of access for various components means the far away components (like the CPU) will suffer higher latency compared to the hypothetical case where the CPU was given dedicated memory. This was duly noted by IBM and Microsoft engineers and subsequently tackled with the aforementioned L2 cache along with extra circuitry (which is explained in the ‘Graphics’ section).

Image

Xenon next to ‘Xenos’ (the GPU) guarding 256 MB of GDDR3, the remaining half is on the back.

The type of memory chips (GDDR3) is the same one found in the Wii and PS3. Conversely, the Xbox 360 gets the crown for including the largest amount. In fact, it’s been reported that Microsoft initially planned to house 256 MB, but later doubled the amount out of fear for Sony’s upcoming competitor [19]. Though, unlike Sony, Microsoft was extra sensible with the retail price, so it had to cut down on other features. That’s why the hard drive was made optional. Furthermore, Microsoft expected Samsung (its memory supplier) to deliver their chips operating at 1.4 GHz but then settled for 700 MHz at the time of shipping the console.

Memory Controller

With all being said, how can Xenon access this memory? Well, the CPU communicates with the GPU through an interface called Front-side Bus that uses two unidirectional buses, the ‘CPU→GPU’ lane is 64 bits wide and runs at 1.35 GHz, while the ‘GPU→CPU’ one is 128-bits wide and runs at 675 MHz [20]. If we do the math, both lanes provide a bandwidth of 10.8 GB/s.

The Front-side bus route only takes you to the GPU. So, between the GPU and the GDDR3 chips, two memory controllers inside the GPU manage this connection using one 1024-bit bus each. To reduce latency, the memory controllers use address tiling for GPU-related operations and path-finding to reduce CPU congestion. In general terms, Microsoft states that there’s a bandwidth of 22.4 GB/s between GPU and memory.

It’s all jolly on paper, but let’s not forget that the CPU still has to walk a long road to get to memory. To give you an idea, if you grab PIX (the CPU profiler included with the SDK) and run the sample tests that come with the utility, you’ll see that for every cache miss, the CPU spends ~600 cycles to get to memory! This is one of the features of UMA-based systems you don’t see marketed very often :), but also helps you understand why cache is so critical.

Inside Xenon: Programming styles

With all being said, how can anybody take advantage of this fascinating chip? Well, there’s only one way: Multi-threading.

From the programming perspective, a CPU that’s made of multiple homogenous cores sharing the same memory is what we call a Symmetric Multi-Processing (SMP) design. This conditions how the application will be designed and implemented. As of 2022, this is the de-facto layout for consumer multi-core CPUs (i.e. x86 and ARM) as well.

SMP programming abstracts access to physical CPU cores with the use of ‘virtual threads’. A virtual thread is a sequence of instructions the programmer defines. Threads can then be submitted to a ‘scheduler’ for their execution. The scheduler is another program (often part of the operating system) that handles how virtual threads are dispatched to the physical CPU cores.

This abstraction layer allows the programmer to avoid discriminating the type of core used (therefore making the program cross-compatible with similar platforms) and hardcoding the number of cores (making it scalable).

Image

Representation of the multi-threading paradigm on Xenon.
A program may create n-threads (two in this example), and then the operating system’s scheduler takes care of dispatching the threads to physical cores. Bear in mind, the operating system also runs as a thread.

It’s no surprise that this style became a standard in future console generations, as it’s easier to scale and program with symmetric systems than asymmetric ones (i.e. Cell). The latter often depends on unusual programming styles, eroding compatibility with other systems.

Also, it’s worth mentioning that, while SMP programming is not necessarily restricted to one platform, Xbox 360 developers will only be writing threads for Xenon (which provides a total of six threads). This enables them to optimise their multi-threading design for the Xbox platform as well.


Graphics

Here’s a peculiar aspect of Xenon. While the chip, by itself, incorporates interesting technology to make it a powerful contender against the competition, Xenon was never conceived as a one-man band. To make this console, as a whole, extremely competitive and long-lasting, Xenon was closely bonded to another chip, the GPU.

Image

Assassin’s Creed (2007).Half-life 2 (The Orange Box, 2007).Portal 2 (2011).Borderlands 2 (2012).Grand Theft Auto V (2013).Example of Xbox 360 games. All rendered at their maximum resolution (1280x720 pixels).

The graphics chip is designed and manufactured by no other than ATI, Nvidia‘s direct rival. To put it into perspective, a new company called ’Artx’ was set up by former employers of SGI that developed the Nintendo 64’s graphics chip, the Reality Co-Processor. Afterwards, Nintendo hired them again to produce the Gamecube’s Flipper chip… at the same time ATI was in the process of acquiring them [22]. By then, ATI was competing in the PC arena with their ‘ATI Rage’ graphics cards and will soon inherit Flipper’s engineers. Fast forward to 2005, Nvidia (now supplying Sony) was now facing ATI again in the console market.

Good old competition, one can only wonder what new innovative product will come out of it.

Overview

Since the early naughties, Nvidia and ATI have been trying to outmatch one another in terms of performance, often by adding incremental improvements to their flagship brands.

Image
The Xenos + EDRAM package next to GDDR3 chips.

Nvidia enjoyed considerable leverage after the introduction of programmable pixel pipeline which later became part of the OpenGL and Direct3D specification. Thus, ATI had no other choice but to follow suit. However, in 2003, ATI recovered its userbase after Nvidia’s anticipated ‘Geforce 5’ line disappointed Direct3D 9 users, who were expecting better performance and functionality from Nvidia’s flagship card [23]. As a result, attention shifted towards ATI’s Radeon 9000 series.

These events allowed ATI to keep the crown for a while longer, but unbeknownst to Nvidia, ATI had been working on a new disrupting ingredient that could hold Nvidia for another decade. This project eventually materialised in the form of Unified Shaders and debuted in no other than the Xbox 360, with the new graphics chip called Xenos.

A new foundation on the way

You’d be surprised that Microsoft’s push for homogenous computing also dragged the GPU into this mix.

You see, there was a time when GPUs were mere rasterizers with ‘take it or leave it’ functionality, meaning they provided a fixed set of functions and programmers could either activate them or not. Later on, thanks to SGIs innovations, workloads were offloaded from the CPU thanks to the new programmable vertex pipelines. Finally, the ‘mere rasterizer’ definition was rendered obsolete (pun intended) as Nvidia promoted a new stage called programmable pixel shader which gave programmers the liberty to control what happens after rasterization.

Image

Overview of the units that compose a GPU with programmable shaders.
In this example, our GPU provides more units for vertex operations than pixel operations, meaning geometry transformations will be quick at the expense of colour effects, for instance.

This pattern shows us that, as GPUs grow in performance and functionality, they eventually open up to allow developers to implement the exact functionality they seek. However, this also increases the overall complexity of the silicon.

Over the years, both vertex and pixel pipelines kept enlarging to accommodate current needs, with new scalability problems emerging. Space, costs and heat emissions have a limit. So, which tasks should be prioritised? vertex operations or pixel effects? A weak vertex pipeline will stall the pixel pipeline. Alternatively, a disappointing pixel unit won’t attract users with its dully-composed frames.

Image

Overview of the units that compose a GPU with a new unified pipeline.
Now that both stages are now provided with the same amount of resources, developers won’t have to trim down operations executed by a weak shader units.

Thus, architects at ATI took two steps back and asked themselves ‘Is there any way to simplify this model instead of adding more transistors on top of it?’ which later became ‘Why are pixel and vertex pipelines segregated?’. And so, the two stages were ultimately merged into a single -unified- block of circuitry. This is what we call the Unified shader model and a GPU implementing it still provides both programmable vertex and pixel pipelines, but the circuitry used for computations is shared.

The new model has two primary benefits: a more balanced scaling (as it will now improve both pixel and vertex pipelines to the same degree), and programmers can now develop shader programs that are not constrained by the limitations of an individual pipeline type. Though the latter will require major changes in the API to standardise its behaviour and availability, which won’t arrive until 2006 with DirectX 10 or in 2010 with OpenGL 3.3.

Suffice to say that the unifier shader model is now another standard in the graphics industry.

Architecture of Xenos

To construct Xenos, ATI borrowed material already present in their consumer products. Though it’s hard to say which existing card they based Xenos on. Wikipedia pairs it with the ‘R520’ core (found in the Radeon x1800) [24], though I find it hard to believe as the R520 contains fundamental differences (related to the new unified architecture) that make me doubt about such relation. In the absence of any reliable source, I turned to the Xenia emulator Discord where I luckily received a lot of input from the community, especially TriΔng3l, who works in emulating Xenos.

In essence, Xenos is based on R400 (also known as ‘Crayola’), a new architecture from the ground up that never materialised in the form of a consumer card (except Xenos, being the only incarnation to this day). Instead, ATI shipped R500-based cards to PC users, which are a continuation of the R300 architecture. Curiously enough, some R520 cards incorporated part of Crayola’s technology, but were marketed as ‘ATI Avivo’ and only used for accelerating video decoding.

In the end, ATI brought the unified shader model to the masses through their new R600 architecture (an evolution of R400), also known as ‘TeraScale’, released in 2007 with the Radeon HD 2000. And the rest is history.

Interestingly enough, the R400 branch carried on in the smartphone world with the ATI Imageon Z430 [25], later called Qualcom Adreno 200, which also went through many iterations. This is another example of innovative technology that finds its way into the mobile market!

Organising the content

Another unusual element from Xenos is its triple memory architecture (so much for emphasising the UMA…). Xenos relies on two different chips to render graphics, one is the aforementioned 512 MB of GDDR3 DRAM (which is shared with the CPU) and the other is a smaller but way faster memory chip residing within the same package (which only Xenos has access too). The latter piece provides 10 MB of Embedded Dynamic RAM (EDRAM) [26]. Curiously enough, EDRAM is not a new ingredient, having been found in the Graphics Synthesiser, the Graphics Engine and, most importantly, Flipper (also ATI’s invention).

Those 512 MB store most - if not all - the materials Xenos needs to render a frame, including textures, shaders and many types of buffers as the game sees fit. On the other side, the 10 MB of EDRAM are left for small elements that require rapid access, such as the Z-buffer, the stencil buffer, the back buffer (intermediate frame buffer) and any other custom buffer if needed. This mitigates congestion from the shared GDDR3 RAM and speeds up operations that make use of those buffers.

Image

Example of how data is organised across the memory available

As if this wasn’t enough, there’s a third source that can feed the GPU, and that is a direct line to the CPU! Unlike anything seen before, the CPU can stream commands and geometry without going through the traditional steps of storing command buffers in external memory, thereby saving once again traffic from main RAM. This is what Microsoft advertised as Procedural Synthesis (XPS) and made possible by two changes [27]:

  • Firstly, the CPU’s front-side bus was adapted so the GPU can directly fetch data from the CPU’s L2 cache (where the CPU may procedurally generate the geometry) in blocks of 128 Bytes (the size of a cache line).
  • Secondly, a separate cached location was added into Xenon so the GPU can notify its current state to the CPU as fast as possible. Microsoft calls it the Tail pointer write-back and keeps both components in sync while the CPU updates the L2 cache and the GPU pulls from it. According to Microsoft, this routine provides a theoretical bandwidth of 18 GB/sec [28].

Overall, this synergy is what has enabled the Xbox 360 to achieve such a high level of performance without requiring overpriced hardware.

Constructing the frame

Having said all that, how does Xenon actually render a frame? Well, similarly to other GPUs in the market, there are many steps involved. Take a look at this diagram of Xenos’ pipeline:

Image

Overview of the graphics pipeline in Xenos.

As you can see, the pipeline stages are not different at all compared to other graphics chips that don’t employ the unified shader model. This is because the actual change happens at the circuitry level and not at the API level (for now). This way, developers are not forced to learn new techniques that will disrupt their traditional methods, but they’ll soon find that the new underlying design will benefit them in terms of performance and extra functionality (on top of the ‘traditional’ one).

Conversely, this GPU is strongly aligned to a proprietary API developed by Microsoft called Direct3D 9. This API has enjoyed increasing adoption in the PC market due to its wide compatibility rate with off-the-shelf graphics cards, making the job of game developers a bit more enjoyable. And even though Xenos will also expose some behaviour not seen in the off-the-shelf market, Direct3D is the only high-level API that Microsoft provided to talk to Xenos. So, what we find in the Xbox 360 is a semi-customised version of Direct3D 9 that makes room for the extra functions of Xenos. Curiously enough, Direct3D influenced the design of Xenos in the same way Xenos influenced future revisions of Direct3D (starting with version 10).

Now it’s time to make a full dive and see how this pipeline works, just like we did in the other articles.

Image
Overview of the command stage

Welcome to the tour of the 12th polygon factory of this series. As always, the starting point is the command stage. Commands tell the GPU what, where and how to draw something on the screen. This time, however, commands may be stored in Main Memory (within a buffer) or directly streamed by the CPU. Both are subsequently fetched by the Command Processor [29], which parses it and forwards it to the respective unit that performs the required operations (as commands may encode different types of instructions, such as ‘draw a triangle’ or ‘set X register’).

It’s worth mentioning that by this generation, the traditional practice of using Display Lists for composing commands has been long superseded by a new structure called Command Buffer (as OpenGL and Direct3D call it). While both terms sound similar, the new entity makes room for new operations that weren’t originally envisioned when Display Lists were conceived (primarily related to vertex, pixels and GPU control). As the APIs evolve, Command Buffers are adapted to reflect modern needs, while Display Lists have been deprecated since 2008 (OpenGL removed them with OpenGL 3.2) [30].

Furthermore, Direct 3D 9’s command buffers (found in the official SDK of the Xbox 360) provide the option to embed an interesting data format called Index Buffers. This enables developers to reuse existing vertex points to avoid duplicating geometry data in memory [31].

Finally, this GPU can also conditionally execute commands depending on arbitrary conditions set by the developers. In Direct3D’s world, this is called Predication and saves the GPU from processing geometry that may be occluded, for instance.

The impact on the industry

The state of the industry after 15 years has demonstrated that the Unified shader model was meant to become another building block within GPUs. Interestingly enough, when Direct3D 10 and OpenGL 3.3 debuted in 2006 and 2010, respectively, not only they standardised this model, but they also defined new features and applications that were now possible to implement. For instance, a new pipeline stage called Geometry shader was added in Direct3D 10 to expand the uses of the tessellator.

Image
Froggy (2006), Nvidia’s next demo that showcased the capabilities of the unified shaders (incorporated with the Geforce 8 line). For demonstration purposes, the demo also allows to, ehem, slap and stretch the poor ugly frog.

Similarly, over the next years since the console’s release, the concepts of Xenos’ shader exports would be embraced by many new APIs, namely Apple & Khronos’ OpenCL, Nvidia’s CUDA, Direct3D 10’s DirectCompute and OpenGL 4.3’s Compute shaders. All of which provide a platform to access the power of GPUs without necessarily having to render anything, just perform computation using the fast shader pipes. Overall, this was another great leap for General-purpose GPU programming.

Now, if you wonder if any of these advancements were later incorporated into the Xbox 360. I’m afraid by the time this console debuted, the unified shaders were still a relatively new concept. So, unfortunately, none of the consequent developments (i.e. Direct3D 10) got ever back-ported to the Xbox 360. But hey, that’s a legacy left for the next generation of consoles to enjoy!

On the other side, the PC market experienced a different revolution that strangely didn’t originate from ATI. It turns out Nvidia won the race on becoming the first company to ship a graphics card using the unified model, I’m referring to the Geforce 8 series (and its Tesla architecture) that landed in 2006 [35]. Ironically, it was the same year the Playstation 3 arrived (also powered by Nvidia’s technology, though with an older architecture…).

Video Output

Unlike the Playstation 3, which arrived a year later, the release of the Xbox 360 in 2005 got caught up in the middle of a transition from analogue video (i.e. RGB and composite) to digital (i.e. HDMI), so the first shipments of the console didn’t bundle the famous HDMI connector that quickly displaced the analogue video salad.

Image
Two different revisions of the Xbox 360 stacked together, the top one is a Zephyr and the bottom revision is a Xenon. Notice that the Zephyr model adds the HDMI connectior, while Xenon only has the analogue A/V socket.

Instead, the first revision (called Xenon, like the CPU) is only equipped with a multi analogue socket called ‘A/V’. This carries all the signals previously featured with the original Xbox, with the addition of VGA-compliant pins [36] (its predecessor had the wires in place but, for some reason, were never powered [37]).

Later on, in 2006, the follow-up motherboard revision called Zephyr ultimately added an HDMI socket to catch up with Sony’s quality of signal. Internally, the original video encoder (the ‘ANA’ chip) had been replaced with the ‘HDMI ANA’ (or ‘HANA’) block [38].

As outdated as this may seem today, Microsoft still required game developers to consider their games being played on CRT screens, which were prone to overscan. Thus, games couldn’t place any important indicators within the Danger Zone.

Short-lived features

Following the requirements of the Video Electronics Standards Association (VESA), Microsoft released a software update in November 2008 to support 16:10 aspect ratios, although only available through the VGA signals. Games were still only required to support 16:9 and 4:3 ratios, so using 16:10 would show a black backdrop (a.k.a. letterbox). Curiously enough, the operating system contains routines to allow games to detect whenever 16:10 is activated, though they were never made public.

Additionally, Microsoft shipped a software update in May 2011 to add support for 3D TVs, much like the Playstation 3 did.

New attitudes towards resolutions

Even though the video encoder can broadcast signals in a multitude of formats, including PAL, NTSC, 720p and 1080p, Microsoft pushed publishers to focus on 720p for two reasons:

  • The general consensus from 2005 predicted that most consumer TVs will soon be equipped with high-resolution and widescreen screens.
  • Since it would take time to master Xenos’ capabilities, launch titles were assumed to not use all of the GPU resources. Thus, games painlessly added HD support to compensate.

With the same rationale, why is not 1080p the focus instead? Well, 1080p would put a significantly higher workload on Xenos, which would limit effects like anti-aliasing. Instead, Microsoft suggested developers should render at 720p and direct the internal video scaler to upscale it to 1080p, if required. To assist with this, Microsoft released another software update in October 2006 to enable 1080p (initially, only 1080i had been available).


Audio

As CPUs evolve to become multi-core jack of all trades, the grounds for housing dedicated audio hardware slowly dissipates. In the end, Microsoft decided that the Xbox 360’s audio department would be the responsibility of the CPU, and as such, the audio pipeline is implemented through software. So, the operating system provides routines which games access through the official SDK. This brings in new advantages, as the general-purpose capabilities of CPUs are not restricted to a fixed set of audio functions.

A year later, the PS3 will follow suit and cement this practice throughout the video game industry.

The hidden audio chip

Conversely, in the case of the Xbox 360, we find a huge exception to this rule: Microsoft fitted a hardware audio decoder on the motherboard [39]. Although this circuit is hidden within the Southbridge chip (explained further in the ‘I/O’ section) and it only supports one codec called Xbox Media Audio (XMA). The latter is a variant of Microsoft’s proprietary Windows Media Audio (WMA), which you may have heard about already. In any case, why was that codec so crucial that Microsoft invested extra hardware in a console already facing budget cuts?

Image
Windows Media Player 6, released in 1999. This was Microsoft’s weapon against its contemporaries, namely Quicktime, Winamp and so forth.

Back in the naughties (and even the late 90s), during the rise of the World Wide Web and affordable multimedia products (i.e. accelerator cards, SIMD-capable CPUs and fast network bandwidth), many companies which were once narrowly focused on productivity software suddenly ventured into a new area of multimedia services.

It so happens users now wanted to watch full-motion movies on a PC, listen to an Audio CD while working on a spreadsheet, rip (and even mix) Audio CDs, just to mention a few examples. We of course take all of this for granted now but mind these ‘abilities’ had to be implemented at some point in history. Initially, they were delivered through third-party software (i.e. Nero and Winamp, to state a few) and later bundled with the operating system (i.e. Windows XP and its integrated CD burner; and Mac OS X 10.1 bundled with iTunes).

Now, if we take a deeper dive into the music world and its never-ending sharing systems, we see that the MP3 became the de-facto codec used for transmitting music [40] and that brings huge demand for media players that decode MP3 (well, technically, MPEG-1). That being said, what is Microsoft’s response to this trend? Well, develop another codec to displace MP3, a new format called Windows Media Audio or ‘WMA’ [41].

A look at the XMA accelerator

In terms of functions, The XMA decoder can either decode up to 5.1 audio channels with a sample rate of 48 kHz and a resolution of 16-bit. No surprises here!

Image

Overview of the audio pipeline

Moving on, the decoder works as follows:

  1. It starts by reading XMA-encoded data from main RAM.
  2. Decodes it.
  3. Outputs PCM data, ready to be streamed, in an arbitrary location in main RAM.
  4. Finally, the CPU can apply effects on it and/or forward it to the audio DAC for hearing.

All of this is abstracted with the use of Microsoft libraries, which handle how data is streamed, optimised in cache and moved around in memory. Developers have multiple APIs at their disposal, each offering thinner-to-thicker layers of abstraction. Be as it may, the thinnest API won’t provide direct access to the XMA decoder, as it’s still a proprietary and undocumented block of circuitry, after all.


Following in the footsteps of the original Xbox, all I/O operations are handled through a thick and opaque chip called Southbridge (though in various documents I’ve seen it referenced as ‘I/O chip’ or ‘XBS’ as well). The Southbridge connects to most, if not all, internal and external interfaces. This chip is only connected to the GPU (not the CPU), so Xenon has to go through Xenos to reach it and subsequently request data from any socket.

Image

The Southbridge chip supervising small I/O chips and sockets.The same picture with important parts labelled.

The Southbridge communicates to Xenos using the PCI Express protocol, composed of two unidirectional serial buses, one for each direction. In the case of this console, each bus can transfer up to 500 MB/sec [43].

Clever L2 access

To keep up with the demand, the Southbridge can write to main RAM. Consequently, there needs to be a mechanism in place to maintain data coherence between the CPU’s caches and the changes done by the Southbridge. Hence, the latter is equipped with an automatic trigger that flushes the CPU’s L1 and L2 if, and only if, the memory address the Southbridge wrote to had been cached by the CPU. Nevertheless, the CPU will have to cache that memory again, though programmers won’t need to worry about manually synchronising the CPU caches every time the Southbridge fiddles with main RAM.

The core of Southbridge

You’ve seen before that the abundance of services in 7th generation consoles has led to the inclusion of many black boxes. These are meant to offload most of (if not all) the security and I/O tasks away from user-accessible components like the CPU and GPU.

Image

Diagram of the Southbridge’s connections.

The Xbox 360 is no stranger to this. Hidden within the Southbridge chip there’s a component called System Management Controller (SMC) and, like the Wii, it draws current even on standby mode. The SMC abstracts lots of I/O operations, including power management, real-time clock, temperature control, LED control and the infrared sensor [44]. The CPU communicates to the SMC using a FIFO command buffer, though this task is restricted to the kernel only (more about it in the ‘Operating System’ section). So, neither the user nor the game can fiddle with it.

Internally, the SMC is nothing more than a classic Intel 8051 microcontroller with its own ROM and RAM [45]. Upon getting power, it boots its internal program and proceeds to read from a separate NAND chip to fetch configuration files.

There are other interesting aspects of the SMC, but we’ll discuss them in due time (don’t worry, I’ll be waiting in the ‘Anti-piracy’ section…).

External interfaces

The set of external ports is not particularly overwhelming, but there are a fair number of them.

Image

Front view of the console, some interfaces are hidden by springloaded doors.

In any case, the console offers:

  • Three USB 2.0 A sockets: one is found at the back of the console and it’s meant to connect to an external accessory (like an official Wifi module) that Microsoft sold separately. The other two are on the front and can be used to connect wired controllers.
    • With the arrival of the ‘Slim’ Xbox 360 model in 2010, an internal Wifi card has since been bundled with (internally connected with USB as well) and two more USB sockets were added on the back.
  • A special socket (combining USB and proprietary interfaces) for the RF module, also known as the front panel that lights up when the console turns on. This component also handles the communication between the wireless controllers (through a proprietary wireless protocol) [46].
  • Two Memory Unit ports: these are mangled USB ports that fit a Memory Unit, a proprietary Memory Card of up to 512 MB of size [47].
  • An RJ45 port for Ethernet connectivity.
  • Infrared Receiver to capture infrared light sent by an Xbox-themed remote controller (an optional accessory).
  • One SATA II socket: mixed with power lines (resulting in a proprietary socket) and exposed at the top of the case (if the console is placed vertically) to connect the optional external hard drive.

Wired with a chance of Wireless

While Sony went straight for wireless controls using Bluetooth technology, the Xbox 360 shipped with different kinds of controllers depending on the ‘edition’ purchased (this doesn’t necessarily affect the motherboard revision, only the amount of accessories bundled with the box).

Image
The new wireless controller/gamepad [48]. Two AA batteries are fitted on the back.

As you may guess, wireless controllers were only given out with the posh ‘premium’ editions, while the low-end & HDD-less ‘Arcade’ model shipped with a single wired controller (this time, using a standard USB A cable). Wireless controllers required two AA batteries, though Microsoft also sold a rechargeable battery block.

The console can handle four controllers at the same time, whether wireless or wired, but note that the console only provides three user-accessible USB ports.

Compared to the old duke, the controller doesn’t take two memory cards anymore, but it keeps the jack pin (2.5 mm TRS type). The wireless model has a special socket on the front to plug the ‘Play and Charge’ cable, an optional accessory to connect wireless controllers to the console and charge the battery at the same time. The cable uses a proprietary bus layout composed of USB-compliant pins with additional undocumented ones [49].

Internal interfaces

These are the interfaces that the Southbridge handles to interconnect chips within the motherboard:

  • One SATA II socket to plug the internal DVD drive.
  • A proprietary 8-bit bus to communicate with a 16 MB NAND Flash chip.
  • Another proprietary bus to talk to the Audio DAC.

That interactive accessory

I don’t tend to give too much attention to accessories because they deviate from the main focus of the article, but there is one special case where Microsoft dedicated an obsessive amount of marketing effort. They were even convinced that ‘gamepads are a thing of the past’ and it’s only a matter of time before this new product would eradicate current user interaction methods.

Well, that didn’t happen, but Kinect is still worthy of study. Released in November 2010 with huge fanfare, Kinect is a sensor bar (similar but not identical to the Wii’s sensor bar, Kinect’s ‘competitor-slash-inspiration’) that captures the user’s position, movement and voice in an effort to provide control of the game without requiring any physical object in the user’s hands.

Image

The Kinect bar, released in November 2010 [50].

The construction of this device isn’t anywhere simple, and as such, the sensor carries many forms of technologies:

  • An RGB camera to capture images of the room and the player [51].
  • A IR emitter and IR sensor that projects IR light and captures its reflection, respectively. This allows to derive a z-buffer of the room, thereby identifying the depth of objects.
  • A microphone that listens to the player.
  • A chip salad that handles the sensors and signals to produce meaningful data from it. This includes 64 MB of DDR2 SDRAM (as a working area), 1 MB of Flash memory (where the main program is stored), a Marvell AP102 SoC (presumably containing the main CPU inside) and a PS1080-A2 DSP (to process the image coming from the cameras), among others. All of this is fitted on three different boards stacked together [52].

The sensor is operated through the use of Microsoft’s opaque APIs, which in turn communicates to super-secret software installed on both the Kinect and the Xbox 360’s Operating System. The former uses a USB 2.0 A connection to plug into the console. The ‘Slim’ re-design of the Xbox 360 released in the same year was also branded ‘Kinect compatible’ as it could supply power to the bar as well (with the provision of an extra proprietary port on the back of the console), older models require a separate power brick for the Kinect.


Operating System (and backwards compatibility)

The Xbox 360 was subject to the same needs and fashions that the Playstation 3 went through. So the former offers many services, including online gaming (through Xbox Live), a digital marketplace, media player, file system explorer (albeit extremely simple) and other utilities.

Overview

The Operating System of the Xbox 360 is a collection of bare-metal utilities and userland applications tightly squashed together to fit into 16 MB of NAND. Just like the Playstation 3, Microsoft built their OS to provide multimedia and networking (i.e. multi-player) capabilities, plus the ability to play games and secure the system from external intrusion.

That being said, the system contains a few core components that take care of the low-level area of this console (hardware access, security and resource management). The rest is composed of applications (i.e. the splash animation, the interactive shell called ‘Dashboard’ and the game itself) and user data (i.e. game saves, user profile, network settings).

A tumultuous progress

Throughout the life of this console, I noticed a more chaotic evolution than its competitor, which involved constant re-branding and radical redesigns of the user interface. I presume Microsoft was trying to shift from one target audience to another. So, to understand this section better, I’ve organised its evolution into these periods:

  • The multimedia hub era (2005-2008): as the first home console to debut in the 7th generation, high-definition graphics, Xbox Live (online platform) and HD movies were the main selling points. And in the case you got bored with them, you could resort to buying accessories [53] (i.e. the remote controller, the headset, the HD DVD reader) to ‘enhance’ your experience.
  • The competitive era (2008-2011): with the Playstation 3 and the Wii now occupying store space, Microsoft found itself in need to aggressively compete for the consumer’s attention. First, a new system update called ‘New Xbox Experience’ overhauled the old interface to increase user personalisation (i.e. 3D avatars) and enhance the social aspect of this console. Meanwhile, a new ‘killer app’ lands in the form of a physical accessory: Kinect. Nobody knows its outcome, yet it proves a very engaging topic of conversation.
  • The cloud era (2011-2013): as all the previous enthusiasm slowly dissipates, Microsoft simplified its user interface to align with its other platforms (Windows 8 and Windows Phone 7) and provided users with the chance to move their local data to the cloud (a.k.a Microsoft’s servers). In the process, third-party advertising became more prominent in the user interface (possibly to increase their source of income). Nonetheless, users are now expecting the successor of the Xbox 360.
  • The capitulation (2014 onwards): with the new successor (Xbox One) reaching the stores, Microsoft dedicated minimum efforts to their (now old) console. Users will only receive system updates in the form of bug fixes.

A recallable security design

If you recall my deconstruction of the Playstation 3’s operating system, the latter was divided into three areas:

  • The Hypervisor, which has complete control of the hardware.
  • The Kernel, interfacing user programs (mostly games) with the hypervisor, and in doing so it provides an extra layer of protection.
  • The user space, encompassing all programs executing on top of the kernel, including games and other apps (i.e. the shell and media player).

This design is correlated to the PPU’s privilege levels (affecting both Cell and Xenon) which prevents ‘casual’ applications, like games, from accessing sensitive resources, such as decryption keys.

IBM enacted three privileges (instead of just two) to allow multiple operating systems running at the same time. With this idea, each OS would only live under the two lowest levels while the highest level would be reserved for the program supervising all operating systems. In practice, the Playstation 3 and Xbox 360 only require a single operating system (except for OtherOS, but that was quickly scrapped). Consequently, Sony and Microsoft designed a hypervisor that enforces their respective security model and performs memory management tasks. Conversely, the architectural differences between Cell and Xenon led to very distinct implementations of hypervisors, so each is subject to unique flaws and strengths.

Image

Diagram showing how the components of the Xbox operating system fit in Xenon’s privilege levels.

The most notable difference between the security model of the Xbox 360 and the Playstation 3, is that the former runs both the Kernel and user-space programs under the same privilege mode (the second level) [54]. So, all critical tasks rest within Hypervisor, which enjoys extra acceleration from the hardware side (I’ll explain more about it in the ‘Anti-piracy’ section).

Architecture

Let’s take a deep dive at the operating system’s main components, note this doesn’t include the boot loaders, but for that, I’ve dedicated a separate section. For now, we’ll check the components that reside in memory after the console boots up.

The Hypervisor is an incredibly small program that only occupies 128 KB [55]. Furthermore, by being assigned the highest privilege level available, it’s tasked with:

  • Preventing programs from accessing areas beyond the permitted boundaries.
  • Combined with the use of page tables, it restricts which areas of memory can be executed by the CPU. This technique is called Write Xor Execute or ‘W^X’.
  • Encrypting executable code.
  • Refusing to load executables that aren’t signed by Microsoft.
  • Protecting memory data from being tampered with. This is done by encrypting the data in main RAM and decrypting it just after the CPU fetches it from L2 cache.
    • This is a complex but interesting feature which is explained further in the ‘Anti-piracy’ section.

The hypervisor is operated through the use of ‘system calls’ [56], external programs invoke these to request functions from the hypervisor. Though whether the hypervisor complies or not is a matter of the program’s accreditations.

Backwards compatibility

As a compelling side note, the Xbox 360 can run a considerable subset of games belonging to the previous Xbox (to avoid confusion, let’s call it the ‘classic’ one). This was done through pure and hard software emulation and, while it didn’t provide the same degree of compatibility of the CECHA PS3 revision, I believe the methods were more clever, cost-effective and consistent.

The emulator was called ‘FU’ [60] (found as xefu.xex) and it could only be installed in the Hard Drive, residing next to other optional assets for the operating system. FU loaded every time the user booted up a classic Xbox game into the Xbox 360. However, for an ‘enjoyable experience’, FU would only proceed with the emulation if the game was listed in its local whitelist, where Microsoft tracked which games were deemed ‘playable’. As more system updates were released, more games were added to the whitelist.

Michael Brundage, one of the authors of FU, described the development process as ‘probably the hardest technical challenge of my career’ and also mentions ‘how inefficient most standard libraries are, especially at math.’ [61].

Storage medium

Now that we’ve seen the structure of the OS, let’s now check how information is scattered across this console. You’ll soon find that there’s an abundance of mediums available, and Microsoft had to device different file systems and protocols to keep the security integrity in place.

Here lies the most critical and fragile program of this console. Similarly to the Playstation 3 (again, both share IBM technology), the Xenon hides 32 KB ROM that stores the first stage of the boot loader, along with Microsoft’s RSA public keys and SHA-1 hashes used to decrypt and validate further boot stages, respectively [62].

This area happens to be unencrypted, as the security system is not set up yet and the CPU only understands unencrypted machine code, but it’s not a concern for Microsoft since the boot ROM is sealed within the CPU die and the public key can’t be used against the system. Although, being a ‘ROM’ also means that this code can’t be patched after it leaves the factory.

In any case, the boot ROM is only executed once after the console is powered on.

Boot process

Ok, so you’ve seen how this system works, how it’s structured and where it’s stored. But, there’s still a missing piece, how do you go from ‘nothing’ (off state) to a ‘Hypervisor running an encrypted Kernel loading up executables signed by Microsoft’? That’s what the boot loader is for, and there are many, many stages in this system.

Multicore chaos

Before we begin, we need to address something fundamental. With homogenous multi-core CPUs, all cores are entitled to ‘own’ the system, but this would only lead to chaos, so only one is tasked with directing the rest. The leader is commonly called ‘master core’ while the rest is named ‘slave core’ or just ‘core’.

Outside the Xbox world, this hierarchy is sometimes found hardwired into the chip, while other architectures need manual intervention. For instance:

  • On ARMv7, all cores wake up at the same time and look for the same reset vector (0x00000000). Thus, the program, now running multiple times, must make each CPU query its mpidr register to identify what type of CPU (master or slave) it’s running from. After this, it can take appropriate action (i.e. give orders or wait for commands) [73].
  • With x86, the first core is the master one. Thus, the first core starts at 0xFFF00000 while the other cores are ‘asleep’. The master then sends an interrupt to the neighbouring cores with an arbitrary address directing them where to start execution.

In the case of PowerPC, well… it’s a tricky situation because Xenon was a custom order and not part of IBM’s PowerPC 2.02 spec. But by looking at some low-level homebrew source code for this console [74], I presume the boot handling is similar to x86.

Boot procedure

With all being said, now I’ll describe what the Xbox 360 does after the user presses the power button.

First things first, the reset vector of the master core appears to be 0x80000200.00000000 [75], this points to a secret location within the CPU: the aforementioned 32 KB of BootROM, where the first stage of the boot process is located.

For the sake of completion, there will be a tiny mention of a special device called ‘eFuse’. This is another storage medium similar to ROM or RAM, but with distinct physical qualities. I’ll talk more about it in the ‘Anti-piracy’ section to avoid overwhelming you, so for now, just remember those basic facts.

Without further ado, here is the how the boot chain works, the starting point being the boot ROM [76][77]:

  1. 1BL (as of ‘1st Boot Loader’): The master core fetches the next stage (called ‘CB’) from NAND into the internal SRAM. CB is found encrypted with the RC4 algorithm. So, the CPU derives a decryption key using a key stored in 1BL and another stored in the header of CB. Once decrypted, the CPU checks its validity by hashing decrypted area and comparing the resulting hash with another stored in CB. The latter also uses Microsoft’s public RSA key, stored in the internal ROM, for validating the hash. Finally, if the whole process completes without the problems, the CPU continues execution at CB, otherwise, the CPU triggers a POST error and the Xbox 360 starts flashing red lights.
    • This process involves a combination of many cyphers, such as HMAC, SHA-1 and ROT.
  2. 2BL or CB: The CPU, now executing from the internal SRAM, queries the eFuses to check that the version of CB running matches the one written on the eFuses. Then, it initialises most of the hardware. Finally, the CPU proceeds to fetch another block from NAND called ‘CD’ onto main RAM and repeats similar decryption and verification tasks 1BL did, this time using another key derived from the ‘CPU key’ (stored in the eFuses). If everything goes well, the next stage proceeds.
    • In later software updates, this process was split into two bootloaders (named CB_A and CB_B) to mitigate faults that were exploited to execute homebrew (more details in the ‘Anti-piracy’ section).
  3. 4BL or CD: This is another decryption stage, albeit closer to the actual payload. This time, the CPU fetches yet-another block called CE, which goes through familiar verification processes (involving RSA checks) and it’s subsequently decompressed in main RAM. Once unpacked, the CPU finds itself in front of the Kernel and the Hypervisor. But it’s not over yet, I’m afraid, as the CPU must now fetch system updates to apply to these two programs, a process known as CF. Before continuing, however, CD orders to compare the number of updates installed against a designated counter in the eFuses block. This is done to prevent downgrades.
    • The decompression algorithm used is called ‘LZX Delta’ and it’s authored by Microsoft.
  4. CF: This stage takes care of fetching update blocks stored on NAND, these are decrypted and decompressed with LZX as well. Once they are ready, they are applied to the kernel and the hypervisor residing on main RAM. Once all updates are done, CF returns to CD.
  5. CD (upon return): Now that everything is set, the CPU is redirected to the Hypervisor in main RAM.
  6. Hypervisor: This program takes over the hardware and then redirects execution to the Kernel, which in turn loads up the Splash animation and the Dashboard.
  7. The user is now in control.

Updatability

I guess it’s no news for you, but yeah, this console has an updatable operating system.

Though, it may surprise you that no matter the revision, all consoles ship with the same initial kernel (version 2.0.1888) from 2005 [78]. What happens is that the software updater installs ‘update packets’ in NAND, in the form of delta patches, and these are applied during boot, as you’ve seen in the previous section.

Image

Old Xbox interface from 2005 installing an update (bye bye Blades).

Updates can be fetched from Xbox Live (as an online download), through a retail game, by burning a conventional CD (that’s how we shared files before the 2010); or through a USB stick. A typical update will typically install the aforementioned patches for the hypervisor and kernel, but it will override user-land files like the dashboard. Finally, to prevent rolling back, the updater will finish by blowing an eFuse so, from now on, the Xbox will only be able to boot the updated version of the system.

Updates don’t usually require extra storage. However, when Microsoft released the ‘NXE’ update in 2008, the updater also installed static resources onto the Hard Drive (mainly themes and assets for the new 3D avatars) as NAND wasn’t big enough. This led Microsoft to increase the size of NAND in further revisions of the motherboard to bundle up to 512 MB of NAND, and later include internal memory units of up to 4 GB instead. You would only find these enhancements on hard-drive-less models.

Interactive shell

I think it’s time to take a break from so much theory and go over the different graphical interfaces that this console underwent. I know this is an article about computer architecture, but user interfaces are as important as all the fancy hardware we’ve been discussing. Also, user interface patterns go hand in hand with my interest in hardware!

In the following paragraphs, you may sense that I’m very critical of each interface, but I think it’s important to correctly identify the strength and flaws of each design so we can understand the reasoning behind its next iteration.

The ‘Blades’ dashboard (2005)

Image

The home screen, the controls are shown on the centre, while the horizontal edges indicates the existance of other blades to navigate to.List of games installed, previously downloaded from the Xbox Live Arcade (a digital marketplace).
Notice how this is a full-screen view, as there wasn’t enough space on the previous type of screen.Settings menu.Storage menu, enabling the user to delete or copy user-only content.
Notice how it removes the other blades indicators to reclaim a bit more screen space.A simple avatar selector screen.Pressing the guide button (a.k.a Xbox logo on the controller) for a long period would bring in the ‘Xbox Guide’, a popup menu offering various shortcuts.The ‘Blades’ dashboard. Screens are from [79] and [80].

The first iteration of the user interface, now an element of nostalgia for some, was the Blades dashboard. Here, each group of services (game, multimedia, settings, etc) is organised into different overlapping layers called ‘blades’. All share the same hierarchy level and focus, meaning the user can only open one blade/category at a time. To navigate around, the user must swap between blades and interact with the actionable controls placed.

In my opinion, this style reminds me of one of those brochures with eye-catching foldable designs (in fact, the manual that came with the console replicated this). Furthermore, the design language (typography, colour palette and blur effects) is somewhat relatable to Windows XP and its Media Center pack, even the first iterations of Windows Longhorn exhibit common traits.

Image
Windows Longhorn, Build 4015 (2003) [81]. Before it became Windows Vista, Microsoft had experimented with many types of UI patterns and colours. This build, in particular, seems to share small peculiarities with the Blades design.
Image
The Media Center app, included with Windows XP Media Center Edition (2005). This edition of Windows XP bundles an extra set of apps that target the TV, as opposed to the PC.

Truth to be told, the overall design was strongly compromised to comply with both 4:3 and 16:9 aspect ratios, without being able to take advantage of the latter. The organisation of elements wasn’t very consistent either, as some views, like the Xbox Guide, felt like all controls had to be squashed into a small space. The most limiting aspect, in my opinion, is the fact the indicators for navigation (list of available blades) were always shown on both edges of the screen. This consumed useful space for the opened blade. Thus, exceptional navigation segues had to be used to move away from the home screen hierarchy and reclaim full-screen space, at the cost of adding more inconsistency to the user interface.

Nevertheless, this is the GUI that gave this console an identity, and it’s obvious that it targeted the ‘young’ audience group, which worked fine for a few years.

The ‘New Xbox Experience’ (2008)

Image

The new home menu. Notice how the user can’t see all elements available unless he/she scrolls.As the Xbox Live expands its repertoire, many types of applications can now be installed (not only arcades anymore).The settings menu keeps the same layout but inherits a new colour palette.The storage menu and its similar palette update. In this example, a custom background has been set by the user.2D avatars are still available, though users are now encouraged to define their own 3D avatar too. This will be used by games (ala Nintendo’s Mii).As a nice tribute, the new guide menu resembles the old Blades design, which also helps provide more and better-organised shortcuts.The ‘NXE’ dashboard. Screens are from [82] and [83].

As Microsoft expanded its catalogue of services and features, the navigation pattern of the blades turned into a liability (as it constrained the number of navigation screens on the top hierarchy) and the lack of available space became apparent in later software updates. In 2008, Microsoft released a revamped Dashboard interface called New Xbox Experience (NXE) which focused on ‘infinite scrolling’ patterns to offer a wider selection of services on the front page. It also incorporated other elements like a glossy filter and 3D transitions recently popularised by Windows Vista and the new range of Windows Live applications.

Image
Windows Vista (2007). Showing the use of the iconic ‘Flip 3D’ to switch between windows.
Image
Windows Live Mail during 2008 [84], featuring a Vista-like theme with glossy effects.

With the new design, screens could now enjoy full-screen space without being conditioned by the hierarchical level they were placed in. Curiously enough, the navigation layout is very similar to what Sony designed with XMB, where vertical arrows change category and horizontal arrows navigate through the elements of the same category.

The ‘Kinect’ update (2010)

Image

The new home screen built for Kinect controls. Notice how the container at the bottom right corner indicates what Kinect is capturing at the moment.As the Xbox Live expands its repertoire, many types of applications can now be installed (not only games).The Settings menu with minor changes applied.The same goes for the Storage menu.3D avatars get some enhancements as well.The Guide menu goes through a minimalistic phase.The ‘Kinect’ dashboard with a custom wallpaper. Screens are from [85].

With the arrival of Kinect, Microsoft had to adapt the Dashboard to remove its dependency on the controller. As part of the update, the designers replaced the glossy touch with faded greens and greys. The 3D aero-like touches were also succeeded by typical zoom-in animations. This coincides with the arrival of Windows 7 and its modest reduction of visual effects.

Image
Windows 7 (2009). Showing the new ‘Aero Peek’ to preview minimised windows.
Image
Windows Live Mail during 2011 [86], part of a desktop suite called ‘Windows Essentials’.

I don’t think there’s anything else to mention apart from the fact it only lasted a year before being replaced by a considerably more radical redesign.

The ‘Metro’ revamp (2011)

Image

The new home screen, where news, advertisements and applications share equivalents amount of screen space. Some types of information (like adverts) are granted more space than what the user would agree.The new app selector resembles a retail shelf.The Settings menu hasn’t changed much.The same applies for the Storage menu.The guide menu got very few additions.The ‘Metro’ dashboard. Screens are from [87].

The year 2010 saw the debut of Windows Phone 7 and a new type of interface called Metro, whose style now relies on solid colours and white silhouettes. Due to its versatile design, Microsoft envisioned porting Metro to all of its platforms, which became a reality the following year with the arrival of Windows 8 and a new update for the Xbox 360. Though the degree of adoption varied drastically between device, the Xbox 360 enjoyed a mostly unified implementation of Metro (unlike Windows 8, where the majority of applications still relied on the ‘classic’ design).

Image

Promotional screens of Windows Phone 7 (2010), showing the new Home screen, the contacts app (featuring the new ‘Panorama’ layout) and the lock screen.

The most significant changes lie in the home screen, which has reverted its two-dimensional navigation style for a new multi-row distribution called ‘Panorama’. The latter borrows similar navigation patterns to the old Blades dashboard, except the new layout does not repeat the same mistakes. Icons now take prevalence over labels, and their uniform shapes lead to a more apathetic composition. But hey, at least the layout is now consistent across all home screen views (although I must confess I miss the creativity of previous eras).


Games

Following the practice of previous articles, this section discusses the aspects of game development, distribution and the infrastructure Microsoft built to complement them.

Development ecosystem

The development follows the semicentennial tradition where the manufacturer offers game studios cutting-edge development kits and libraries in exchange for a hefty fee. Additionally, with the advent of small companies venturing into the video game business, a new category of ‘Indy’ game developers emerged to define those game studios that share the same enthusiasms and/or talent but are obliged to work on a smaller budget.

As a consequence, Microsoft marketed two types of development kits [88]:

  • Traditional game studios paying a full license would have access to the official Xbox Development Kit or ‘XDK’. This pack includes the entire SDK, many types of development hardware and, depending on their relationship with Microsoft, even prototypes.
  • Indy studios were offered an alternative suite called Microsoft XNA, which includes similar development hardware but a different SDK called ‘XNA’.

To provide more resources, Microsoft deployed the Game Developer Network (GDN) and the XNA Creators Club, two online portals where developers would have access to more documentation and tools.

Available tools and kits

Those using the expensive XDK will see it follows the footsteps of its predecessor. The APIs provided are largely based on DirectX’s routines to ease the learning curve for PC developers already familiar with DirectX on the Windows platform. XDK implements DirectX version 9.0c (identified as ‘9_3’) extended with Xbox-unique features.

All in all, the suite includes a variety of libraries and tools, for instance:

  • Existing front-end Visual C and C+ 2005+ compilers (Microsoft’s variant of C and C++), along with a new backend that generates PowerPC machine code. There are also PowerPC, xvs_3_0 and xps_3_0 assemblers (the latter two are Xenos’ microcode shader languages [89]). The front-end compilers were later updated to the 2010 version.
  • Microsoft’s Visual Studio 2005 as IDE. Later versions of XDK shipped with the 2008 version and, finally, the 2010 one.
  • Debuggers for the Xbox 360 Development Kit.
  • The well-known Direct3D API for graphics. Although it’s hardly a replica of the Windows version. For example, the shader compiler now generates microcode at compile-time, unlike Windows’ edition where it’s done at runtime (as Windows games must support a multitude of graphics cards). The APIs have also been extended to accommodate Procedural Synthesis, Predicated Tiling and many more cases. These have been documented by the Xenia emulator’s development team [90].
  • Familiar functions from Windows 2000’s Win32 API, now targeting the Xbox 360’s Kernel, such as the CreateThread() routine for multi-threading operations.
  • Old but soon-to-be-replaced APIs like DirectSound which, as seen in further revisions of the XDK, was soon succeeded with XAudio2. The latter was developed to take advantage of the unique features of the Xbox 360 and to unify the syntax with Windows’s DirectX APIs.
  • Other non-DirectX APIs like the Natural User Interface to operate Kinect; and Xbox 360 Unified Storage to abstract the large number of storage mediums available.

Now, for the budget studios using XNA. Well, the latter was designed to facilitate cross-platform development for Windows XP and the Xbox 360 [91]. XNA worked as a wrapper of DirectX, so low-level functions were abstracted by routines that work on both Windows and the Xbox 360. Though additional code is still needed to accommodate the small discrepancies between the two unequal platforms [92], the platform still benefited many developers that were looking to port their games to the Xbox 360 without a lot of effort or cost. For those who are familiar with Windows’ development ecosystem, XNA is a successor of the old ‘Managed DirectX’ platform, which includes a subset of the .NET Compact Framework [93].

Unfortunately, later years will see new versions of the XNA framework supporting new devices (like Windows Phone 7) but slowly distancing itself away from the now-obsolete Xbox 360 hardware, until Microsoft abandoned XNA altogether in 2013 [94].

As a final note, for homebrew developers, libXenon is a free third-party library that provides low-end access to the hardware without copyright restrictions [95], although its executables can only be run from a hacked Xbox 360 with signatures verification disabled.

Storage medium

Let’s now see how the distribution of content worked.

Image
Example of retail game

It may sound hard to believe, but for Microsoft, the same dual-layer DVD disc used for the original Xbox worked fine for their new console. I presume it was another cost-effective strategy, as the DVD disc got cheaper to produce and its readers could be outsourced to the most inexpensive and convenient manufacturer. Throughout the console’s lifespan, Microsoft had relied on Hitachi, LG, Toshiba, Samsung, Philips, BenQ and LiteON - including teams of two - to produce their DVD drives.

As a reminder, dual-layer DVDs (also called DVD-9) have a theoretical capacity of 8.5 GB. A big advantage compared to the new blue-lasered discs of the Playstation 3, is that the Xbox 360 enjoys greater read speeds thanks to its 12x drive, another reason to expel the hard drive from the requirements.

On the other side, games were greatly constrained by the usable capacity, those 8.5 GB were further limited by a secure filesystem called Xbox Game Disc 2 or ‘XGD2’. This was Microsoft’s successor of the original format employed to prevent unauthorised copies. The actual available size for games was down to 6.8 GB (80% of the total capacity) [96]. In 2011, after many third-party tools managed to crack the protection, Xbox Game Disc 3 or ‘XGD3’ was made available to developers. In it, Microsoft managed to revamp the security tricks and reduce the space reserved for anti-piracy mechanisms, leaving a total of 7.9 GB for games.

Network service

At the heart of this low-priced console was its highly-promoted online capabilities, which materialised in the form of a paid subscription service called Xbox Live (the name also encompasses the marketplace, software updates and other internet-based services).

As Xbox Live enjoyed successful adoption with the original Xbox, the successor would be no stranger in following the footsteps and even offer more commodities to get users to join online gaming. I presume the low-price point of this console and continuous marketing efforts - including all the accessories that Microsoft shipped (webcams, headsets, etc) - meant that Microsoft’s main source of profit relied on paid subscriptions and marketplace purchases. With the advent of 3D avatars, the company also found ways to capitalise on avatar customisation [104].

Image

Curiously enough, The console calculated a score for the user’s Xbox Live profile which was based on the number and type of achievements unlocked.
Want to reach the top ranks in online billboards? Play more. Ran out of games? Buy More!

From the developer’s point of view, Xbox Live servers provided the infrastructure required for multi-playing, voice chat, profile management and so on. Xbox 360 games used a REST API (documented on the SDK) to talk with each endpoint. Furthermore, to authenticate with those servers, developers got a custom ‘Xbox Secure Token Service’ that they needed to embed with their REST requests.


Anti-Piracy and Homebrew

As the end of the article looms, I still haven’t set out how this system was absurdly protected. This console has an incredibly long and diverse history [105] involving interesting discoveries, unexpected methods of exploitation and aggressive forms of retaliation. Most importantly, this saga led to many third-party documents detailing the construction of the hardware and the peculiarities of the software. So, it’s fair to say that half of the content in this article wouldn’t have been possible without the countless hours that many hacking groups dedicated to researching and documenting their findings - a task not necessarily done to enable piracy.

Main targets

The first thing is to introduce the two main targets. As soon as the Xbox 360 reached the stores, Microsoft found itself defending two big fronts:

  • The Hypervisor, which holds control of the CPU and enforces signed executables, both of which had to be tackled if Homebrew or booting another OS (i.e. Linux) were to ever be possible.
    • By extension, this also affects user data (i.e. profiles stored on the hard drive and/or Memory Unit) but they implement extra layers of security. These are not mentioned here so we can focus on the main targets.
  • The DVD drive, which enforces the copy protection system and prevented burned discs from being executed as games. DVDs are inexpensive and most PCs of that era were equipped with a DVD RW drive, so cracking looks extremely rewarding. Although, it would only lead to piracy, but not Homebrew (as the code is still signed). That is, unless someone discovers an exploit that depends on the DVD drive but once ‘triggered’ enables to execute Homebrew… we’ll see!
    • Curiously enough, the hypervisor and the rest of the operating system have no awareness of the state of the copy protection [106], that’s delegated to the disc drive, which has the upper hand on accepting the inserted disc as a valid Xbox 360 game or not.

The hypervisor front

We’ll now learn how the hypervisor is protected and, in turn, how it protects the rest of the system.

A hidden cryptographic system

Remember how complex was the L2 subsystem within Xenon? Well, there’s one more thing to explain, and that is the inclusion of a hidden cryptographic block in it. I define it as ‘hidden’ since it’s not documented at all by Microsoft or IBM. I was made aware of it thanks to an incredibly insightful talk called ‘The Xbox 360 Security System and its Weaknesses’ by the Free60 group (lead by Michael Steil and Felix Domke, the former also did ‘The ultimate GameBoy talk’!) [107] and ‘Security offence and defence strategies’ by Mathieu Renard [108], which is where I rely on for a big part of the information described here.

Image

Overview of the security components placed within Xenon.
They are strategically placed so the PPEs don’t need to be aware of them and/or perform any manual work.

Moving on, the cryptographic subsystem is split into distinct areas that perform unique functions:

The random number generator unit (or just ‘rand’) produces random sets of numbers without exposing a predictable pattern, or at least it attempts to. This is useful for supplying extra parameters to encryption routines, so they are difficult to trace back (and thus, reverse engineer it) or replicate.

You may now ask ‘Why can’t this be done with software?’ and, well, in the world of traditional computing, there’s no such thing as ‘pure randomness’. Computers can only generate pseudo-random numbers using arguments that may be potentially traced. So, having a dedicated unit for this task obfuscates its reverse engineering further and protects the CPU from possible snooping (as this block is hidden within the CPU and not scattered across the motherboard).

I guess the most optimal solution would be to connect the CPU to a quantum computer, but we are not there yet! (somewhat close though).

Security implementations

Let’s see what Microsoft came up with the use of all those components.

The convoluted boot process you’ve read before was designed to implement a sophisticated chain of trust. This makes sure that, once the hypervisor is ready to execute userland code, the latter is always encrypted and signed by no other than Microsoft.

To efficiently obfuscate the boot process, multiple types of cyphers are employed. For instance, The second bootloader (2BL/CB) is decrypted using RC4 (a fast algorithm), whose decryption key is constructed at run-time using HMAC-SHA. Then, the decrypted code is hashed (using SHA1 and ROT). Finally, to check that 2BL hasn’t been tampered with, the calculated hash is compared against a stored hash pre-signed with Microsoft’s private key (using RSA).

For an increased protection, each console embeds a unique CPU key imprinted on the eFuses just before the console leaves the factory. At runtime, the CPU key is mixed with other parameters to devise part of the bootloader’s keys. All in all, this means cracking one CPU key won’t affect other consoles.

Once the Hypervisor is loaded into main RAM, the chain of trust is in place and no code will be executed without the Hypervisor’s permission, especially since the latter possesses W^X abilities.

The hypervisor’s duties

Once the hypervisor is fully running with Level 1 privilege, it dips into the internal SRAM of Xenon to provide the following functions:

  • Act as a Memory Management Unit (MMU) by handling its private virtual memory page table, where it keeps track of the memory boundaries of every program and whether they are executable or not. This protects the hardware from buggy software producing buffer overflows.
  • In cooperation with the L2 block, it ensures all executable code is encrypted when stored with RAM.

To execute any userland application, the Kernel copies the (unencrypted) code into memory and the hypervisor checks that it’s been signed with Microsoft’s private RSA key [112]. If the signature is valid, the Hypervisor proceeds to duplicate the executable code in memory, and in doing so the L2 block automatically encrypts it. Finally, the Hypervisor marks the respective memory pages as ‘executable’ and the rest is history.

Protecting the DVD reader

Due to the choice of a highly popular and affordable medium for game distribution, concerns about piracy resonated at Microsoft’s headquarters. The anti-piracy system became a joint project between Microsoft and their suppliers, with varying results depending on the manufacturer of the DVD reader.

From Microsoft’s control, DVD drives authenticate with the system using a unique DVD key that must match an internal key handled by the Hypervisor. The DVD key is derived from the CPU key [113]. If the check fails, the reader still works but won’t be able to launch Xbox 360 titles.

Additionally, to avoid being read by conventional DVD readers and preclude exact duplicates, games are mastered using the new Xbox Game Disc 2 (XGD2) data format, which engraves a couple of anti-piracy tricks [114]:

  • A misleading table of contents that deceives conventional DVD players into loading a static video, this asks the user to insert the disc into an Xbox 360 console.
  • Within the file system, there’s a database stored that contains multiple checks called challenges or ‘security sectors’ the driver performs to confirm that the current disc is valid. When referring to the DVD drive, these challenges simply enumerate faulty sectors deliberately made during mastering that the drive needs to find. As conventional burners don’t account for them, a duplicate would never pass the verification process.
  • The outer area of the disc exposes a burst-cutting area, although it doesn’t appear to include meaningful data for copy protection.

Finally, add to the fact the hypervisor only launches executables signed with Microsoft’s private keys.

Fallback techniques

As it’s obvious by now that anti-piracy measures can only protect for so much time. The Xbox 360 was designed to have a mostly-overridable operating system. Anything but the boot ROM could be updated as long as the correct encryption keys were used (for which Microsoft was the sole owner).

Thus, throughout the console’s lifecycle, the company issued software updates that patched the CB (second-stage bootloader), the Kernel, Hypervisor and the DVD reader’s firmware. You will soon see the reasoning behind each one of these areas.

Additionally, Microsoft promoted (or actually, enforced) such updates by only permitting Xbox Live access from consoles running the latest system. This discouraged users from holding on to old versions and, combined with the anti-downgrade eFuses, reduced the availability of exploitable consoles, even for research purposes.

Finally, in the event of a modified console having access to Xbox Live services, Microsoft implemented many protections behind the scenes to keep hacked consoles and/or cheaters out of their network, including the use of ‘Phoning home’ routines to help Microsoft to detect ‘special’ consoles; and the subsequent application of ‘permanent bans’ by console blacklisting.

Defeat

If you’ve been following the Xbox 360’s scene for some years, you probably know by now that this console enjoyed one of the most creative exploits of its generation. What initially started as simple arbitrary execution techniques (although they aren’t particularly simple), was soon followed by commercial hardware sold for these purposes.

I’ll try to give a proper summary of events now, but if at the end you feel hungry for more, well-known hacker ‘15432’ wrote a very informative three-part article that dives into every breakthrough until 2020 [117][118][119].

The DVD drive saga

History seems to impose that whenever a console includes a widely-adopted medium like the CD or DVD, it’s just a matter of time before that area is cracked. Especially since piracy is easier to deliver than homebrew.

In May 2006 (before the Playstation 3 reached the stores), user ‘commodore4eva’ released the Xtreme Firmware, a replacement DVD reader firmware for the Xbox 360s bundled with early Toshiba-Samsung drives [120]. The new firmware instructed the drive to look for hardcoded security blocks instead, which pirated copies would have relocated.

The firmware was flashed through the use of ATA commands, but it required a computer with a spare SATA socket and special flashing software. Consequently, users had to open up the console and connect the SATA data cable to their computer, run the flasher while the console supplied energy to the drive and then re-assemble everything back. The act of flashing emerged from a hidden ‘maintenance mode’ that was discovered in the drive’s firmware, this was subsequently reversed and re-enabled. If the process was carried out successfully, the Xbox 360 would now read pirated games just like any genuine one.

Image

Example of a DVD drive flashing tutorial found on Youtube [121]. Here, the Xbox 360’s drive connects to the PC using SATA while the console still supplies power (as the power connector is proprietary).

That’s pretty much how the DVD drive got cracked. As time went by, the techniques evolved but the fundamental idea remained the same (flash a custom firmware). Subsequent readers increased their security to seal backdoors and Microsoft helped by bundling DVD drive firmwares within their ‘software updates’ to undo any unauthorised work done by the user. In later years, readers levelled up to the point the user had to drill a hole into the drive’s System-on-Chip to enable flashing [122].

From Microsoft’s headquarters, a new disc format called XGD3 was devised to improve the collection of challenges. Additionally, XGD3 increased the data capacity by expanding the writing limits beyond the standard area (hindering duplication along the way) [123].

The hypervisor saga

The fundamentals of the hypervisor may seem flawless at first, however, there’s a technical compromise hidden underneath it: external components (the southbridge, the GPU, etc) need access to main RAM too. Yet, they don’t understand the hypervisor’s encryption mechanisms (as that’s a matter between the CPU and main RAM). So, how can the I/O make use of RAM? Simple, through direct access (DMA) and unencrypted data. This is not a security breach per se, as the hypervisor still enforces its security on CPU-executable pages.

Image

XeLL-Reloaded, a Linux and Homebrew bootstrapper.
Loading this program into the Xbox 360 will soon become the No. 1 goal of any modder.

Nonetheless, this is what the early intrusions focused on. In other words, can we inject unsigned code into main RAM and then trick the hypervisor into executing it? Well, in February 2007, an anonymous hacker published a privilege escalation exploit that relied on the 2005 game King Kong [124]. In the report, a clever chain of events was outlined to get arbitrary code execution at the hypervisor level (meaning full hardware access). The way it works is not simple to explain, but I’ll try my best. Let’s see…

To organise all the routines available, the hypervisor stores a syscall table in main RAM [125], along with a handler that traverses such table.

When a program invokes a syscall, the handler starts by making sure the requested syscall number is valid. This is done by checking that its value is less than 0x61 (meaning there are up to 96 syscalls available). Finally, the handler translates the syscall number to a virtual address in memory (where the routine is found) and the CPU continues execution there.

Another concept to remember is that the hypervisor is stored in RAM in an encrypted form, and by extension all syscalls routines are encrypted too. Consequently, the resulting virtual address pointing to a routine has the encryption flag set. Thus, the L2 block seamlessly decrypts the data fetched from RAM before handing it to the CPU.

With that in mind, hackers took a look at the assembly code of the syscall handler and found a major flaw. Essentially, the syscall number is passed through a register. Registers in Xenon are 64-bit long, yet the validity check is implemented with the cmplwi instruction (used for unsigned 32-bit values). This means that if a program (previously authorised by the hypervisor) were to request a syscall with a value greater than 0xFFFFFFFF, like 0x20000000.0000002A, the hypervisor would only verify 0x0000002A and that would pass the check.

Now, where am I going with this? Well, if you recall, the upper 32 bits on 64-bit virtual addresses are used as flags to instruct the L2 block to encrypt/decrypt or hash the memory values, and if the flags are all zeroes, the L2 will interpret the data as unencrypted and pass it right away. That being said, if by some chance an external piece of unencrypted code were to be inserted in RAM through external means, there might be a way to execute it through the flawed syscall handler…

Protecting King Kong

The Kink Kong exploit was a huge step for the Homebrew scene. I’m not sure if you noticed, but no encryption keys were needed whatsoever (so much for having a strong security system in place…).

Be as it may, Microsoft was already made aware of it and subsequently patched it in January 2007 (one month before the report was published). Nonetheless, efforts continued in finding similar exploits for updated consoles. Additionally, hackers could now obtain an outdated console and make use of the King Kong exploit to expand their research.

As a result, throughout the same year of the King Kong events (2007), new discoveries were made:

Once CB, the second stage bootloader, was reverse engineered, a new backdoor was discovered. One of the checks CB performs is comparing a string stored in CB’s headers against checksums of NAND. Well, if that string (called ‘pairing information’) turns out to be all zeroes, then the check is skipped and the boot process continues while ignoring the CPU key. The only downside is that no update packets are applied and the Kernel ends up launching an alternative app called MfgBootLauncher (presumably used during manufacturing) instead of the Dashboard.

Initially, this discovery didn’t attract attention as it didn’t seem to lead to any interesting development. However, this changed after Microsoft noticed mentions of MfgBootLauncher and subsequently released a new system update, which ended up attracting too much attention, even to the point this special ‘manufacturing mode’ became an opportunity to homebrew!

Before we continue with this story, there’s one more hack that needs to be explained…

Setting King Kong free

As homebrew development kept evolving and the King Kong method became too cumbersome, the focus shifted to finding new techniques for automating the hypervisor exploit, and the results were astonishing, to say the least.

Two years after the King Kong exploit publication, in August 2009, Microsoft released an unexpected software update 2.0.8498 that, once again, overwrote the CB (second-stage bootloader) and increased its respective eFuse counter [131]. Not only this update surprisingly writes over crucial parts of the system (with the risk of breaking it irreparably), but it also caught the attention of many hackers to find out what was Microsoft trying to patch.

The Free60 group, in their continuous efforts to deliver Linux on the Xbox 360, advised users to hold on to their current system and not succumb to the update. Behind the scenes, they already knew about the implied exploit and were preparing to release a new technique that the public could take advantage of.

The dawn of Homebrew

Even though the aforementioned discoveries were outraced by Microsoft’s updates, the homebrew community kept gaining momentum and new applications emerged to become a ‘must have’ for any modding enthusiast:

  • XeLL (from Xenon Linux Loader), later forked as XeLL-Reloaded, is the de-facto payload used with the Hypervisor exploit [133]. It sits as a second-stage bootloader with an amount of functionality analogous to a Victorinox for Scouts. XeLL enables to boot up Linux or a XEX executable written with libxenon, an alternative SDK designed for homebrew development. Furthermore, upon loading, its text interface automatically displays the eFuses data and other low-level information on-screen so the user can write it down if needed. If that’s not enough, its self-hosted HTTP site provides extra controls to dump the contents of the eFuses and NAND on-the-fly.
  • freeBOOT is a modification of the official operating system that disables the signature verification of the Hypervisor and removes hardcoded caps set by Microsoft (i.e. the Security sector checks preventing to use of third-party hard drives), among other things [134]. freeBOOT is loaded from XeLL and proceeds to boot the official system as any other unmodified console, except that non-signed executables can be launched from the Dashboard from now on. At the time of this writing, applying Freeboot remains the main goal for most homebrew users.
    • To produce a ‘freeBOOTed’ image (eventually flashed into the NAND), the CPU key of the console is needed. This means the console needs to first load XeLL (using a previous exploit) so the key can be extracted. Then, a new NAND image with Freeboot’s patches applied can be generated.
  • FreeStyle Dashboard and Aurora are two replacements for the original Dashboard that provide enhanced controls for executing homebrew apps, loading up game dumps and tweaking low-level settings (i.e. fan speed). While they are meant to overtake the official Dashboard, they are still loaded as any other XEX executable.
  • XeX Loader and XeX Menu are homebrew launchers that can be run from the official Dashboard.
  • Dashlaunch is another program that provides a collection of patches for the official system [135], except they are loaded at runtime (avoiding the need for flashing NAND). One of Dashlaunch’s functions includes the ability to automatically load an executable after the Dashboard boots, this is often used to quickly load the alternative homebrew-friendly FreeStyle Dashboard or Aurora.

Image

FreeStyleDash running on my modified console, offering many controls and indicators not found on the official dashboard. For those veteran Xbox 360 users, you’ll notice its interface ressembles the iconic NXE era!

To facilitate the process of building a NAND image with XeLL and/or Freeboot, many Windows programs like XeBuild, J-Runner and AutoGG were developed to automatise this process as much as possible.

Outracing Microsoft’s advantage

As joyful as the previous events may seem, I’m afraid to say that the SMC/JTAG trick was released just after Microsoft patched it! If you are curious, the software update blacklisted King-Kong vulnerable Hypervisors so they can’t be loaded at CB. Moreover, the new CB also blew a new eFuse counter, so it was made impossible to fight back… unless a new unimaginable hack were to be published…

Image
First demo showing the RGH hack on a Slim (Trinity) console [136].

After two idling years since the last major breakthrough, in August 2011, user ‘GliGli’ submitted a technical report outlying the discovery of a Reset Glitch Hack (RGH) [137]. This new technique allowed any type of Xbox 360 (no matter its update installed) to run an arbitrary version of the Hypervisor, just like the SMC/JTAG hack, except that it will now need an external chip soldered in.

In a nutshell, RGH exploits a fundamental flaw in the CPU’s circuitry: if the CPU receives a short pulse (10 to 60 ns long) on its RESET line, the CPU will not fully reset but continue execution in a corrupted state, and that can work in the hacker’s advantage if the CPU is in the middle of asserting something crucial, like verifying the bootloaders. Additionally, user ‘cjak’ found out that by latching a special line in the motherboard called CPU_PLL_BYPASS, the CPU is slowed down to ~25 MHz, thereby making room to glitch it.

With this, GliGli and a group of hackers grabbed a generic but fast CPLD board (similar to an FPGA), soldered it to many useful points on the motherboard and programmed it to:

  1. Keep track of the Xbox’s POST signals to know when and where to latch the buses.
  2. Slow down the CPU when CB is about to validate CD’s hashes.
  3. Latch the RESET line when CB is in the middle of a memcmp() function (related to the hash verification).
  4. Revert the CPU’s original speed and hope that the memcmp() has magically succeeded.
  5. If the process worked, a custom payload is executed.

With this, a board programmed to glitch the Xbox 360’s CPU became known as a glitcher.

Because the process relies on correct timings (from a combination of POST inputs and hardcoded timers) and very precise signalling, success is not guaranteed anymore. Although to automate the process, the SMC’s firmware is altered to keep rebooting the console if the boot stage has failed.

And with this, the homebrew community was gifted with an unfixable exploit to load homebrew.

The endgame

With the approach of the Xbox 360’s successor, Microsoft took one last breath and shipped a redesigned version of the console called ‘E’, and within it, another motherboard revision called Winchester. The latter finally removed the POST signals and filtered the CPU RESET line from external disturbance [147], which rendered the RGH hack, after three years since its discovery, obsolete.

However, for the compatible motherboards (which are still found in abundance), only wonderful things awaited. Apart from the mentioned RGH variants, there was still a golden discovery to be uncovered. Fast forward to November 2021, 15432 surprised the community again by publishing RGH 3.0, a universal RGH variant to rule them all. Behind the scenes, the new technique is an evolution of RGH 1.2 V2 that doesn’t require a glitcher anymore [148]. This was done by implementing the glitching stage into the SMC, which now only needs two wires soldered on the motherboard (in particular, POST and PLL) to execute the hack.


That’s all folks

Image

Two Xbox 360s of mine, a ‘phat’ Zephyr next to a ‘slim’ Trinity, placed side-by-side for contemplation purposes.

Congratulations on reaching the end of the article!

I think this writing has shown us again that you don’t need an expensive technology to conquer the new generation. Don’t get me wrong, I still consider Cell a great invention that brought many technical advancements to the consumer market, but I perceive its capabilities differed drastically compared to the rest of the chips in the PS3. By contrast, the Xbox 360 presented a more harmonious mix where all components could work together to unlock each other’s potential. In any case, every console carries its unique set of strengths and weaknesses. That’s why I like writing about all of them.

Moving on, to gather materials for this study, I end up hoarding an absurd amount of models, mainly because I needed to find consoles with specific operating versions installed (as they can’t be downgraded). This later became a hit-and-miss challenge and, to make matters worse, my old one broke just before starting this site. Anyhow, at the end of the day I found myself surrounded with:

  • An unused Xenon from 2005 that the owner was afraid to open the box. To my surprise, it’s running 2.0.1888 (too reliclike for me to fiddle with), so I had to let it be…
  • A Zephyr model that the previous owner sent for repair, but by the time Microsoft shipped it back, the owner had already bought another one. It’s got an NXE system installed.
  • A Jasper from a store clearout that never got used as well, it’s running NXE too.
  • A Falcon with RGH 1.2. I bought it to attempt to downgrade it to a Blades-era system, which turned out to be impossible without proper tools. Fortunately, thanks to the software provided by Octal’s Console Shop, it’s now straightforward to do so.
  • My old Trinity that I had since I was a teenager. Unfortunately, it got cursed with the Red Lights of Death and I couldn’t manage to repair it. Luckily, after RGH 3.0 came out, I bought a cheap spare Trinity, applied RGH and transplanted the internal Flash unit to recover my old profile and saves. All of the game screenshots in this article come from this model.

With the write-up now finished, I’m glad I can finally put this search to rest. By the way, this is the last article of the 7th generation saga!

So, regarding my next steps, I’m considering returning to the 90s era to pay tribute to historical consoles that slipped my mind, or maybe do some more work on the site… we’ll see!

Until next time!
Rodrigo


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK