The Xbox 360 processor was designed to give game developers the power that they actually need, in an easy to use form. The Cell processor has impressive streaming floating-point power that is of limited use for games.
The majority of game code is a mixture of integer, floating-point, and vector math, with lots of branches and random memory accesses. This code is best handled by a general purpose CPU with a cache, branch predictor, and vector unit.
The Cell's seven DSPs (what Sony calls SPEs) have no cache, no direct access to memory, no branch predictor, and a different instruction set from the PS3's main CPU. They are not designed for or efficient at general purpose computing. DSPs are not appropriate for game programming.
Xbox 360 has three general purpose CPU cores. The Cell processor has only one.
Xbox 360's CPUs has vector processing power on each CPU core. Each Xbox 360 core has 128 vector registers per hardware thread, with a dot product instruction, and a shared 1-MB L2 cache. The Cell processor's vector processing power is mostly on the seven DSPs.
Dot products are critical to games because they are used in 3D math to calculate vector lengths, projections, transformations, and more. The Xbox 360 CPU has a dot product instruction, where other CPUs such as Cell must emulate dot product using multiple instructions.
Cell's streaming floating-point work is done on its seven DSP processors. Since geometry processing is moved to the GPU, the need for streaming floating-point work and other DSP style programming in games has dropped dramatically.
Just like with the PS2's Emotion Engine, with its missing L2 cache, the Cell is designed for a type of game programming that accounts for a minor percentage of processing time.
Sony's CPU is ideal for an environment where 12.5% of the work is general-purpose computing and 87.5% of the work is DSP calculations. That sort of mix makes sense for video playback or networked waveform analysis, but not for games. In fact, when analyzing real games one finds almost the opposite distribution of general purpose computing and DSP calculation requirements. A relatively small percentage of instructions are actually floating point. Of those instructions which are floating-point, very few involve processing continuous streams of numbers. Instead they are used in tasks like AI and path-finding, which require random access to memory and frequent branches, which the DSPs are ill-suited to.
Based on measurements of running next generation games, only ~10-30% of the instructions executed are floating point. The remainders of the instructions are load, store, integer, branch, etc. Even fewer of the instructions executed are streaming floating point—probably ~5-10%. Cell is optimized for streaming floating-point, with 87.5% of its cores good for streaming floating-point and nothing else.
Game programmers do not want to spread their code over eight processors, especially when seven of the processors are poorly suited for general purpose programming. Evenly distributing game code across eight processors is extremely difficult.
Even ignoring the bandwidth limitations the PS3's GPU is not as powerful as the Xbox 360's GPU.
Below are the specs from Sony's press release regarding the PS3's GPU.
Independent vertex/pixel shaders
51 billion dot products per second (total system performance)
136 "shader operations" per clock
The interesting ALU performance numbers are 51 billion dot products per second (total system performance), 300M transistors, and more than twice as powerful as the 6800 Ultra.
The 51 billions dot products per cycle were listed on a summary slide of total graphics system performance and are assumed to include the Cell processor. Sony's calculations seem to assume that the Cell can do a dot product per cycle per DSP, despite not having a dot product instruction.
However, using Sony's claim, 7 dot products per cycle * 3.2 GHz = 22.4 billion dot products per second for the CPU. That leaves 51 - 22.4 = 28.6 billion dot products per second that are left over for the GPU. That leaves 28.6 billion dot products per second / 550 MHz = 52 GPU ALU ops per clock.
It is important to note that if the RSX ALUs are similar to the GeForce 6800 ALUs then they work on vector4s, while the Xbox 360 GPU ALUs work on vector5s. The total programmable GPU floating point performance for the PS3 would be 52 ALU ops * 4 floats per op *2 (madd) * 550 MHz = 228.8 GFLOPS which is less than the Xbox 360's 48 ALU ops * 5 floats per op * 2 (madd) * 500 MHz= 240 GFLOPS.
With the number of transistors being slightly larger on the Xbox 360 GPU (330M) it's not surprising that the total programmable GFLOPs number is very close.
The PS3 does have the additional 7 DSPs on the Cell to add more floating point ops for graphics rendering, but the Xbox 360's three general purpose cores with custom D3D and dot product instructions are more customized for true graphics related calculations.
The 6800 Ultra has 16 pixel pipes, 6 vertex pipes, and runs at 400 MHz. Given the RSX's 2x better than a 6800 Ultra number and the higher frequency of the RSX, one can roughly estimate that it will have 24 pixel shading pipes and 4 vertex shading pipes (fewer vertex shading pipes since the Cell DSPs will do some vertex shading). If the PS3 GPU keeps the 6800 pixel shader pipe co-issue architecture which is hinted at in Sony's press release, this again gives it 24 pixel pipes* 2 issued per pipe + 4 vertex pipes = 52 dot products per clock in the GPU.
If the RSX follows the 6800 Ultra route, it will have 24 texture samplers, but when in use they take up an ALU slot, making the PS3 GPU in practice even less impressive. Even if it does manage to decouple texture fetching from ALU co-issue, it won't have enough bandwidth to fetch the textures anyways.
For shader operations per clock, Sony is most likely counting each pixel pipe as four ALU operations (co-issued vector+scalar) and a texture operation per pixel pipe and 4 scalar operations for each vector pipe, for a total of 24 * (4 + 1) + (4*4) = 136 operations per cycle or 136 * 550 = 74.8 GOps per second.
Given the Xbox 360 GPU's multithreading and balanced design, you really can't compare the two systems in terms of shading operations per clock. However, the Xbox 360's GPU can do 48 ALU operations (each can do a vector4 and scalar op per clock), 16 texture fetches, 32 control flow operations, and 16 programmable vertex fetch operations with tessellation per clock for a total of 48*2 + 16 + 32 + 16 = 160 operations per cycle or 160 * 500 = 80 GOps per second.
Overall, the automatic shader load balancing, memory export features, programmable vertex fetching, programmable triangle tesselator, full rate texture fetching in the vertex shader, and other "well beyond shader model 3.0" features of the Xbox 360 GPU should also contribute to overall rendering performance.
The PS3 has 22.4 GB/s of GDDR3 bandwidth and 25.6 GB/s of RDRAM bandwidth for a total system bandwidth of 48 GB/s.
The Xbox 360 has 22.4 GB/s of GDDR3 bandwidth and a 256 GB/s of EDRAM bandwidth for a total of 278.4 GB/s total system bandwidth.
Why does the Xbox 360 have such an extreme amount of bandwidth? Even the simplest calculations show that a large amount of bandwidth is consumed by the frame buffer. For example, with simple color rendering and Z testing at 550 MHz the frame buffer alone requires 52.8 GB/s at 8 pixels per clock. The PS3's memory bandwidth is insufficient to maintain its GPU's peak rendering speed, even without texture and vertex fetches.
The PS3 uses Z and color compression to try to compensate for the lack of memory bandwidth. The problem with Z and color compression is that the compression breaks down quickly when rendering complex next-generation 3D scenes.
HDR, alpha-blending, and anti-aliasing require even more memory bandwidth. This is why Xbox 360 has 256 GB/s bandwidth reserved just for the frame buffer. This allows the Xbox 360 GPU to do Z testing, HDR, and alpha blended color rendering with 4X MSAA at full rate and still have the entire main bus bandwidth of 22.4 GB/s left over for textures and vertices.
When you break down the numbers, Xbox 360 has provably more performance than PS3. Keep in mind that Sony has a track record of over promising and under delivering on technical performance. The truth is that both systems pack a lot of power for high definition games and entertainment.
However, hardware performance, while important, is only a third of the puzzle. Xbox 360 is a fusion of hardware, software and services. Without the software and services to power it, even the most powerful hardware becomes inconsequential. Xbox 360 games—by leveraging cutting-edge hardware, software, and services—will outperform the PlayStation 3.
-- BY Douglas C Perry
Posted by XERO .ALL RIGHTS RESERVED.