5

PlayStation Architecture | A Practical Analysis

 2 years ago
source link: https://www.copetti.org/writings/consoles/playstation/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Supporting imagery

The original PlayStation.
Released on 03/12/1994 in Japan, 09/09/1995 in North America, 29/09/1995 in Europe.

The PS One (slim revision).
Released on 07/07/2000 in Japan, 19/09/2000 in North America, 29/09/2000 in Europe.
This is what you get after cramming so many custom ICs into a single one.

Main architecture diagram
The Bus Interface Unit is also connected to special ports of the GPU and SPU.


A quick introduction

Sony knew that 3D hardware can get very messy to develop for. Thus, their debuting console will keep its design simple and practical… Although this may come with a cost!

This section dissects the ‘Sony CXD8530BQ’, one of the two big chips this console houses. It’s what we would call a ‘System-on-Chip’ in today’s world.

The origins

The main processor is one of those ‘x designed by y, based on z and second-sourced from w’ which is a bit dense to summarise in a few sentences, so why don’t we start with some historical context?

The early nineties were marked by a change in the fates of many popular CPUs. The once-leading 8-bit CPUs, such as the Z80 and 6502, were already out of spotlight and Motorola’s famous 68000, along with other 16-bit designs that enjoyed success in the late 80s, had become candidates for replacement. Even at this time in the PC world, Tanenbaum, in his celebrated debate with Torvalds, gave Intel’s x86 architecture only five more years until its demise from the home market.

At first it may look as if technological development had hit a wall at this point. But in reality there was a new wave of relatively-unknown CPUs making their way into mainstream devices. Many of these designs had originated in academia, and so existed to prove particular sets of design ideas. Examples of next-gen CPUs at this point included:

  • MIPS: Adopted by Silicon Graphics Incorporated (targeting graphics workstations).
  • PowerPC: Adopted by Apple (targeting desktop publishing).
  • SPARC: Created by Sun Microsystems (targeting servers and business workstations).
  • ARM: Created by Acorn (targeting the consumer market and workstations).

All of these processors had something in common: They followed the Reduced Instruction Set Computer (RISC) discipline, which radically shifted how these chips had to be designed and programmed. One rule of RISC CPUs dictated that a single instruction can’t mix memory with register operations, allowing hardware designers to simplify the circuitry that executes instructions… and then enhance it with parallelism techniques.

The offering

The resulting CPU core runs at 33.87 MHz and features:

  • The MIPS I ISA: The first version of the MIPS instruction set. Among many things, words are 32-bit long and the instruction set includes multiplication and division instructions.
  • 32 general-purpose registers and 2 multiplication/division registers: These are 32-bit as well. One general-purpose register is always zero (R0), which is common in RISC processors.
  • 32-bit data bus: In the PS1, the data bus forks into two buses.
    • Main Bus (32-bit) → Connects to main RAM, the MDEC and GPU.
    • Sub Bus (16/8-bit) → Connects to the rest of the chips and I/O. This bus is bridged by the Bus Interface Unit, which also enables access to special ports of the GPU and SPU.
  • 32-bit address bus: Up to 4 GB of physical memory (i.e. RAM, memory-mapped I/O, etc) can be accessed.
  • 5-stage pipeline: Up to five instructions can be executed simultaneously (a detailed explanation can be found in a previous article).
  • 4 KB of instruction cache: It can be ‘isolated’ as well, allowing the program to manipulate instruction cache directly.
    • Oddly, there is no data cache. The 1 KB of memory normally used for the data cache is mapped to a fixed address. This area is also called Scratchpad (fast SRAM).

To do something meaningful, Sony provided 2 MB of RAM for general-purpose use. Curiously enough, they fitted Extended Data Out (EDO) chips on the motherboard. These are slightly more efficient than typical DRAM, obtaining lower latency.

Taking over the CPU

At some point, any subsystem (graphics, audio or CD) will require large chunks of data at a fast rate. However, the CPU will not always be able to keep up with the demand.

For this reason, the CD-ROM Controller, MDEC, GPU, SPU and the Parallel port have access to an exclusive DMA controller whenever they require it. DMA takes control of the main bus and performs a data transfer. The resulting rate is a lot faster than relying on the CPU, though the latter is still needed to set up a DMA transfer.

Also, bear in mind that once DMA kicks in, the CPU can’t access the main bus. This means the CPU will be idling unless it’s got something in Scratchpad to keep it busy!

Complementing the core

Like other MIPS R3000-based CPUs, the CW33000 supports configurations with up to four coprocessors, Sony customised it with three:

Identified as ‘CP0’, the System Control Coprocessor is a common block found on MIPS CPUs. In R3000-based systems, like this one, the CP0 controls how the cache is implemented. Thus, enabling direct access to data cache (in the form of ‘Scratchpad’) and instruction cache (with ‘cache isolation’). The control coprocessor is also responsible for handling interrupts, exceptions and breakpoints, the latter is useful during debugging.

Wait, shouldn’t co-processors only expand CPU functions? Why is CP0 tightly coupled to the CPU?

Indeed, R3000 cores depend on the system control coprocessor to be able to make use of many components, but whether this should be ‘legal’ or not comes down to the interpretation of the word ‘co-processor’. According to MIPS, a co-processor is not strictly an optional part of the CPU, it may also command the CPU’s surroundings (cache, interrupts, etc). Hence, a co-processor can be an integral part of the system. This is just something to bear in mind while talking about MIPS-related systems.

Later R4000-based systems incorporated a Memory Management Unit (MMU) and a Translation Lookaside Buffer (TLB) into this block, thereby increasing its capabilities and taking up new roles.

Missing units?

So far, we got a ‘CP0’ and a ‘CP2’, but where’s the ‘CP1’? Well, that’s reserved for a Floating Point Unit (FPU) and I’m afraid Sony didn’t provide one. This doesn’t mean the CPU can’t perform arithmetic with decimal numbers, it just won’t be fast enough (software-emulated FPU) or too precise (fixed-point arithmetic instead).

Game logic (involving physics, collision detection, etc) still can get around with fixed-point arithmetic. Fixed-point encoding stores decimal numbers with an immutable number of decimal places. This implies a loss in precision after certain operations, but remember, this is a video-game console, not a professional flight simulator. Hence, the precision-performance trade-off is somewhat feasible.

By the way, sometimes I mix up ‘fixed-point’, ‘floating-point’, ‘decimal’ and ‘integer’ number types (hopefully not anymore!). If you feel the same, I recommend taking a look at Gabriel Ivancescu’s quick summary (see the ‘Sources’ section) to quickly refresh those concepts.

Delay galore

As we’ve seen before, the CW33300 is a pipelined processor, meaning that it queues up multiple instructions and executes them in parallel at different stages. This hugely improves instruction throughput, but if it’s not controlled properly, it can lead to pipeline hazards, resulting in computational errors.

The MIPS I architecture is susceptible to control hazards and data hazards, which means that instructions may get executed when they shouldn’t be; and that instructions may operate with outdated data before it’s been updated.

Instructions from ‘Spyro The Dragon’ visualised on the NO$PSX debugger.
Notice how LW (load word from memory), JAL (jump and link) and BAL (branch on not equal) are followed by a delay slot to prevent hazards.
Marked in red are fillers (useless instructions).
Marked in blue perform meaningful operations.

Consequently, MIPS I CPUs exhibit the following behaviour:

  • Any instruction following a ‘branch’ or ‘jump’ type opcode is executed unconditionally: Thus, developers have to manually fill the pipeline with modest instructions (such as calculate 0 plus 0) after the branch or jump to mitigate the hazard. These fillers are called branch delay slots.
    • Modern CPUs converted this phenomenon into an advantage: Branch prediction. By adding extra circuitry to detect the hazard, the CPU discards the new computations if the branch/jump condition didn’t meet. But if it did, then the CPU has saved some time.
  • ‘Load’ instructions don’t stall the pipeline until the retrieved data is made available: The second stage of the pipeline (called ‘RD’ or ‘Read and Decode’) gathers the operators, which will be used to perform a computation at the third stage (ALU). The fourth stage (‘MEM’, from ‘access MEMory’) looks for data in memory (i.e. main RAM, CD reader, etc). Now, here’s the problem: by the time a load instruction gathered the data from outside, the following instruction had already fetched the operators. This means that an instruction depending on the values of the previous load instruction requires a filler in-between, so the correct operators can be fetched on time.

As we can see from the example, some delay slots are filled with meaningful instructions, which perform computations that are not affected by the hazard. Hence, delay slots don’t always imply a waste of cycles.

In most cases, the compiler or assembler will automatically re-arrange instructions to fill in slots, or add useless fillers as a last measure. So, all in all, this phenomenon is a bit of a mixed bag.


Graphics

To recap, a large part of the graphics pipeline is carried out by the GTE. This includes perspective transformation (which projects the 3D space onto a 2D plane using the camera’s perspective) and lighting. The processed data is then sent to Sony’s proprietary GPU for rendering.

Organising the content

The system features 1 MB of VRAM that will be used to store the frame buffer, textures and other resources the GPU will require to render the scene. The CPU can fill this area using DMA.

The type of chip fitted (VRAM) is dual-ported, like the Virtual Boy’s. VRAM uses two 16-bit buses, which enables concurrent access between the CPU/DMA/GPU and the Video encoder.

Memory layout using VRAM

Memory layout using SGRAM

Though in later revisions of this console, Sony switched to SGRAM chips (the single-ported option using an individual 32-bit data bus). Boo!… Well, to be fair, each one comes with its pros and cons. One thing for sure, is that due to the timing differences, later games (such as Jet Moto 3) will display glitched graphics when ran on VRAM-based systems. If you want to know the details, Martin Korth’s ‘Nocash PSX Specifications’ document the different timings and such.

Drawing the scene

If you’ve been reading the Sega Saturn article, let me tell you that the design of this GPU a lot simpler!

Now, to show how a scene is drawn, I’ll mainly use Insomniac’s Spyro: Year of the Dragon as an example. Please bear in mind that the internal resolution of this game is too cramped (292x217 px), preventing me to clearly dissect it at each stage, so I’ve upscaled it a bit for demonstration purposes. Here is a sample at original scale if you are curious.

Basic GPU pipeline

To start with, the CPU sends geometry data (vertices) to the GPU by filling its internal 64 byte FIFO buffer with commands (up to three). Essentially, a command states how and where to draw one primitive.

Once the geometry is received, clipping is applied to skip operations over unseen polygons (residing outside the camera’s viewport).

The location of the primitive is stated with an X/Y coordinate system pointing to the frame buffer. The PS1’s GPU employs an integer coordinate model, where each coordinate corresponds to the centre point of a pixel (called sampling point). In other words, there are no fractional coordinates.

Once finished, the GPU writes the pixels into the frame buffer area in VRAM, which is in turn picked up by the video encoder and broadcasted to the screen.

Designs

Let’s take a break now from all this theory. Here are some examples of game characters designed from the ground up for the 3D era, they are interactive so I encourage you to check them out!

3D model

Tap to enable interaction

Spyro the Dragon (1998)
413 triangles

3D model

Tap to enable interaction

Crash Bandicoot (1996)
732 triangles

Playing with VRAM

With the available amount of VRAM (1 whole megabyte), one could allocate a massive frame buffer of 1024×512 pixels with 16-bit colours or a realistic one of 960×512 pixels with 24-bit colours - allowing to draw the best frames any game has ever shown… This sounds pretty impressive, right? Well, it does raise a couple of issues, for instance:

  • Those dimensions will have to be rescaled to follow a standardised definition (i.e. 480 NTSC, 576 PAL) so the video encoder can broadcast it to consumer TVs.
  • How is the GPU going to be able to draw anything decent if there isn’t any space left for the rest of the materials (i.e. textures, colour tables, etc)?
  • The PS1’s GPU can only draw frame buffers with up to 640×480 pixels and 16 bpp colours.

All right, so let’s have a 16 bpp 640x480 buffer instead, which leaves 424 KB of VRAM for materials. So far so good? Again, such resolution may be fine on CRT monitors, but not particularly noticeable on those 90s TVs everyone had at their homes. Then, is there any way to optimise the frame-buffer? Introducing adjustable frame-buffers.

VRAM visualisation on the NO$PSX debugger.
You can spot the dual frame buffers, along with textures (to be translated with a colour-lookup table).

In essence, instead of wasting valuable VRAM by using ‘unappreciated’ resolutions, this console’s GPU allows decreasing the dimensions of the frame buffer to effectively increment the space available for other resources. In ‘Gears Episode 2’ (see ‘Sources’ section), Halkun shows a setup that divides the 640x480 frame-buffer into two 320x480 ones, and then relies on a technique called page-flipping to render multiple scenes at the same time.

Page-flipping consists of switching the location of the frame for display between the two available whenever the game wants it, allowing the game to render one scene while displaying another. Thus, hiding any flickering effect and improving loading times (something that the player will certainly appreciate!).

Overall, Halkun’s layout only consumes 600 KB of VRAM. The rest (424 KB) can be used to store colour lookup tables and textures that, combined with 2 KB of texture cache available, results in a very convenient and efficient setup.

Finally, it is worth mentioning that VRAM can be mapped using multiple colour depths simultaneously, meaning that programmers can allocate a 16 bpp frame buffer next to 24 bpp bitmaps (used by FMV frames, for instance). This is another feature facilitating further optimisation of space.

Secrets and Limitations

Whereas the PS1 had a very simple and suitable architecture, problems ended up arising anyway. Surprisingly, certain issues were tackled with very clever workarounds!

No support for video.

Textures jiggle a bit while they move
Square Soft’s Final Fantasy VIII (1999)

The routines used for handling geometry and applying textures are known to carry some inaccuracies.

Firstly, the rasteriser only handles pixel units: While vertex coordinates are integers, the calculated triangle edges may only occupy a fraction of a pixel. However, the rasterizer will only draw the pixel if the triangle area covers the pixel’s sampling point, and it won’t keep track of the fraction occupied. This brings up some problems:

  • External edges of models make sudden jumps when moved slightly.
  • Triangles within a mesh (sharing the same vertices and edges) will be ‘fighting’ to draw over the same pixels. With the ordering table, the GPU will draw following a ‘last-come first served’ basis, which can lead to triangle intersections flickering or overlapping when moved slightly.

This is commonly tackled by implementing sub-pixel resolution, from which the rasteriser keeps track of the fractions of pixels occupied by each triangle area. Consequently, anti-aliasing methods can then be added to soften jagged edges or sudden changes in colour.

Moving on, the ordering table puts the burden on the developer/program to show the geometry in the right order. In some cases, the calculations implemented rely on too many approximations to gain performance. This may result in flickering or occluded surfaces that should have been displayed.

Furthermore, setting a low-resolution frame buffer may amplify all these aliasing issues.

Finally, as you know, affine transformations have no sense of depth, which may confuse the user’s perception when the camera is close to the model and is positioned perpendicular to the viewer. The effect is also referred to as texture warping. Hence, some games resorted to tessellation (dividing a big polygon into smaller ones) to reduce distortion, others just swapped textures in favour of solid colours. In general, a GPU ultimately tackles this issue by implementing perspective correction, which interpolates textures using the depth value.

Video out

The first revision of this console carries a surprising amount of video signals with the following ports:

  • RFU DC: This one got removed pretty quickly, it was meant to be connected to an RF modulator.
  • RCA: Provides composite video.
  • S-Video: Provides Luma + Sync (combined) and Chroma.
  • AV Multi Out: Provides all the previous signals, except RFU, plus RGB and a 5+ Volts line.

Later revisions of the console removed these ports and at the end, only ‘AV Multi Out’ was left.


Audio

Sony’s signature Sound Processing Unit (SPU) takes care of this. This chip supports the enormous amount of 24 channels of 16-bit ADPCM samples (a more efficient version of the well-known PCM sampling) with a sampling rate of 44.1 KHz (Audio CD quality).

This chip also provides the following capabilities:

  • Pitch modulation: As the name suggests, games can automatically alter the pitch of their samples instead of needing to store extra ones. This is useful for music sequencing.
  • Frequency modulation: Voices can be assigned to alter the frequency of others. Comparable to FM synthesis.
  • ADSR Envelope: These are a set of properties available for amplitude modulation.
  • Looping: Instructs the system to play a piece of audio repeatedly.
  • Digital reverb: Simulates the sample being played within a specific atmosphere to immerse the player.

512 KB of DRAM (called ‘Sound RAM’) are provided as audio buffer. This memory is accessible from the CPU (only through DMA) and the CD controller. Although games only have 508 KB available to store samples, the rest is reserved by the SPU to process Audio CD music. This amount is reduced even further if reverb is activated.

The CD controller is also able to send samples directly to the audio mixer without going through the audio buffer or requiring CPU intervention. Samples can also be compressed using the ‘XA’ encoding, which the SPU can decode on the fly.

The streaming era

Similarly to the Saturn, games are no longer dependant on music sequencing or pre-defined waveforms, and thanks to the amount of storage available on the CD-ROM medium, developers can store fully produced samples and just stream them to the audio chip.

There are two I/O ports (Serial and Parallel) available for add-ons. However, these were removed in later revisions of the console due to lack of adoption and the fact that they could potentially be used to crack the copy protection system.

CD subsystem

The block controlling the CD drive is an interesting area, you can imagine it as an separate computer living inside the PlayStation.

CD Subsystem layout

This subsystem is composed of:

  • A DSP: controls the motor and laser, and processes the RF signal coming from the laser.
  • A Sub-CPU: A CPU package made of a Motorola 68HC05 microcontroller, 512 B of RAM and 16 KB of ROM. In a nutshell, the Sub-CPU runs a local program stored in ROM and controls the DSP. The Sub-CPU program implements copy-protection measures and they are enforced whether the main CPU ‘likes it or not’.
  • A CD Controller: This is the middle-man between the CD subsystem and the rest of the console, receiving commands from main CPU (in a FIFO manner) and triggering interrupts after certain events. As a controller, the chip talks to the Sub-CPU and receives CD data from the DSP. Furthermore, the controller contains a DMA unit and it’s connected to the SPU, which allows it to stream audio directly.
  • 32 KB of SRAM connected to the controller: This is presumably used as a buffer for data read from the disc.

The subsystem somewhat resembles a typical CD reader everyone had at their home, except with the tweaks Sony implemented in the Sub-CPU program to perform anti-piracy checks.

Front ports

The controller and the Memory Card slots are electrically identical, so the address of each one is hardcoded. Additionally, Sony altered the physical shape of the ports to avoid accidents.

Communication with these devices is accomplished using a serial interface. Commands sent from the console will be delivered to one of the two slots (either ‘mem. card 0’ and ‘controller 0’, or ‘mem. card 1’ and ‘controller 1’). Then, both accessories will answer with their unique identifiers, this will allow the console to focus on a particular type of device (memory card or controller) from now on.


Operating System

The system includes a 512 KB ROM that stores a ‘BIOS’. This program performs many services, including taking care of the startup process, displaying a user shell and finally, exposing a collection of I/O routines.

BIOS/Kernel

The BIOS is a critical dependency for games, as this program bootstraps them from the CD drive. Moreover, the BIOS serves as a ‘middle man’ to interact with the console’s hardware. The latter methodology is similar to what IBM implemented with their IBM PC BIOS, which encouraged developers to make use of a standard interrupt table (containing I/O routines) instead of platform-dependent I/O ports.

Having said that, the PS1 BIOS exposes routines such as:

  • Commands for the CD-ROM drive.
  • Filesystem operations (from CD-ROM and memory card).
  • Multithreading.
  • Standard C functions (string manipulation, memory operations, etc).

Since BIOS ROM access is very slow (it’s connected to an 8-bit data bus), the APIs are packaged in the form of a Kernel and copied to main RAM during boot. Thus, 64 KB of main RAM are reserved for said Kernel. By the way, the Kernel is also referred to as PlayStation OS.

Boot process

The CPU’s reset vector is at 0xBFC00000, which points to the BIOS ROM.

Famous splash scene after powering on the console

PlayStation logo showing after a genuine game is inserted

The shell is displayed if there isn’t any disc inserted

Similarly to the Saturn’s boot process, after receiving power, the PS1 will:

  1. Look for the BIOS ROM and execute routines to initialise the hardware.
  2. Load Playstation OS.
  3. Display the splash screen.
  4. If there is a CD inserted, the CD-ROM controller will check if it’s genuine:
    • It is → The controller will allow to read its content.
      1. The CPU will look for ‘SYSTEM.CNF’ and continue execution from there.
    • It’s not → The CPU will display an error message.
  5. With no CD inserted, the CPU will display the shell. The user is now in control.

The shell is a simple graphical interface that enables the user to copy or delete saves from the memory card; or play an audio CD.


Games

Programs have all the facilities that the CD medium provides: Large storage (640 MB), good audio quality and a ‘not-so-slow’ read speed thanks to the 2x drive.

Development ecosystem

The official SDK provided C libraries which are linked to BIOS routines to access the hardware. If you wonder, this is the main factor that helped to emulate the PS1 on a wide range of platforms.

Along with the SDK, Sony also distributed specialised hardware like the DTL-H2000: a dual-slot ISA card containing the internals and I/O of the PS1, plus extra circuitry for debugging purposes. The board has access to the host’s hard drive and can execute PS1 software without restrictions. Software and drivers used to communicate with the card ran on PCs with Windows 3.1 or 95.


Anti-piracy / Region Lock

Like any other optical media, in order to fetch data from a CD, a laser beam is used to read the pits (zeroes) and lands (ones) from the track of the disc. Now, conventional discs are not 100% flat and they often have tiny fluctuations in their tracks. These defects are completely unnoticeable while reading the data since lasers can automatically calibrate themselves as they read.

This is what Sony based their copy protection on: The Sub-CPU will allow reading discs whose Table of Contents (TOC) are engraved using a defined frequency informally known as Wobble Groove, which is only applied during mastering and cannot be replicated through conventional burners. The TOC is found in the inner section of the CD (called ‘Lead-In’ area) and instructs the laser on how to navigate throughout the disc, it’s repeated many times as a fault-tolerance mechanism.

Within the PS1 game’s TOC, one of the following character strings is embedded:

  • SCEA → Sony Computer Entertainment of America.
  • SCEE → Sony Computer Entertainment of Europe.
  • SCEI → Sony Computer Entertainment of Japan.

As you can imagine, the reader applies region-locking using this technique as well.

Defeat

On the other side, this check is only executed once at the start, so manually swapping the disc just after passing the check can defeat this protection… with the risk of damaging the drive. Later on, some games took matters into their own hands and often reinitialised the drive in-game so the check would be executed again, this was done in an effort to prevent users from performing the ‘swap trick’.

Alternatively, tiny boards programmed to mock the wobble signal could be soldered in the console. These boards are known as Modchips and, while legally questionable, they were incredibly popular.

Retaliation

The use of emulators was seen as a threat for publishers as well. As a result, some games included their own checks (mostly checksums) to combat any type of unauthorised use or modification.

One of the checks I was told consisted of deliberately reinitialising the drive and then making it read specific sectors that would not pass the wobble groove check. If this managed to unlock the drive anyway, the game (still residing in RAM) would happily reveal its anti-piracy material. Notice that this approach can also affect modded consoles using genuine games.

Later on, Sony provided a library called Lybcrypt which fortified copy protection with the use of two approaches:

  • From the hardware side, checksums of sectors are stored in sub-channels of the disc.
    • CD-ROM sub-channels traditionally store metadata, mostly to guide the drive. These aren’t user accessible and conventional readers rarely allow to manually write over them.
  • From the software side, a set of routines that get the checksum values and mix them with others are embedded at different points of the game. This attempted to mitigate both emulators and modchips.

That’s all folks


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK