A Glimpse Inside the Cell Processor 66
			
		 	
				XenoPhage writes "Gamasutra has up an article by Jim Turley about the design of the Cell processor, the main processor of the upcoming Playstation 3.  It gives a decent overview of the structure of the cell processor itself, including the CBE, PPE, and SPE units." From the article: "Remember your first time? Programming a processor, that is. It must have seemed both exciting and challenging. You ain't seen nothing yet. Even garden-variety microprocessors present plenty of challenges to an experienced programmer or development team. Now imagine programming nine different processors all at once, from a single source-code stream, and making them all cooperate. When it works, it works amazingly well. But making it work is the trick."
		 	
		
		
		
		
			
		
	
Re:eh (Score:2)
Re:eh (Score:4, Insightful)
Nope, console gamers buy consoles because they offer games that dont appear on the PC and/or dont have the money to buy a pc gaming rig. $1200+ (im talking building from the ground up with reliable and decent parts) to just start getting a decent computer together usualy isnt as justifiable as spending ($100:GC, $130:DS, $150:PS2/Xbox, $200:PSP, $400:360) for a console of some sort.
Re:eh (Score:1)
Re:eh (Score:2)
I'll pay a premium to play games comfortably, thank you.
Re:eh (Score:2)
The cost of a gaming rig isn't just in the building - it's also in the maintaining of a PC gaming rig. To continue to play the newest FPS you constantly need to upgrade processors and especially GPUs. Over a longer period you will probably have to replace your whole mobo to be able to use the new processor socket or chipset and get the new AGP4x, AGP8x, PCIe slot advancement. Over that s
Re:eh (Score:2)
If you're willing to compromise a video card will hold out fairly well for quite a bit longer, and you'll probably still see better graphics than you'd get on any console.
Nevertheless, a PC is significantly more expensive than a console, although the PS3 is doing a good job of changing that. The point is that a PC does far more than any console will ever do. And if all
Re:eh (Score:2)
Considering that Gamasutra is the website for Game Developer Magazine... Yes, I think the author really did expect that an appreciable percentage of his readers would be programmers.
Oh yeah, I remember my first time (Score:4, Funny)
Re:Oh yeah, I remember my first time (Score:5, Funny)
Per chance, did you have a MySpace account, and your parents didn't know about your little shin-dig?
Re:Oh yeah, I remember my first time (Score:5, Funny)
Re:Oh yeah, I remember my first time (Score:3, Funny)
Re:Oh yeah, I remember my first time (Score:2)
Re:Oh yeah, I remember my first time (Score:2)
She must have taught you how to use Unix or OS X for the first time, since that is the only believeable answer. (This is Slashdot, after all).
The article's author is huffing crack here... (Score:2)
Re:The article's author is huffing crack here... (Score:1)
Thusly, you're right, the parallelization is the answer (atleast according to the Cell design philosophy). Because it's possible to put so many transistors on there, the way to do it without running into as many problems would be to cr
Re:The article's author is huffing crack here... (Score:4, Insightful)
I wasn't talking so much about the article as a whole, but the insane levels of hyperbole in the particular paragraph I quoted. "We're capable of putting more transistors on a chip than we can think of things to do with". That's not even vaguely true.
More transistors == more power, all else being equal, because it's all those junctions flipping state so quickly that uses the power.
As for the insanity if Intel's processors... that seems to be a perversion particular to Intel. In the past three decades that I've been following the industry, Intel has only managed to produce *one* sane CPU design, the i960, and they promptly caponised it by removing the MMU and relegating it to embedded controls lest it outcompete their cash cow.
The rest... from the 4004 through the 8080, the 8086 and its many descendants, iApx432, i860, and Itanium... have been consistently outperformed by chips with smaller transistor budgets built by companies with far fewer resources. They only occasionally broke past the midrange of the RISC chips, and were usually trailing back with the anemic Sparc. Where they have excelled has been marketing and in the breadth of their support... both hardware and business. IBM went with the 8088 because they could get them in quantity and they could get good cheap support chips for them: if you went with Motorola or Zilog or Western Digital or National Semiconductor you pretty much had to go back to Intel to build the rest of your computer anyway.
Re:The article's author is huffing crack here... (Score:2)
Re:The article's author is huffing crack here... (Score:3, Insightful)
The mesh is common to all the processors, and not that big, it can be broadcast. Textures are the big chunk, but most pieces will only need high resolution versions of the textures in their direct view... unless a processor is looking at an optically interesting surface (for reflections or refractions) it can get by with mesh-resolution approximations to the tex
Re:The article's author is huffing crack here... (Score:2)
You neglected to mention the primary reason this is true; you don't have to do anything fancy, because it's fairly rare that we even need to parallelize rendering a single frame these days - most rendering involves big bulk numbers of frames which are later assembled into a video. You can always send individual frames to clients. Thus you can parallelize it without even doing anything hard.
Re:memory speed? (Score:2, Informative)
Re:memory speed? (Score:1)
You spread a bunch of FUD about the PS3 and get +1 informative.
Then you correct your FUD, and also get +1 informative.
Gotta love Slashdot...
Re:memory speed? (Score:1)
Re:memory speed? (Score:5, Informative)
This is the speed at which the Cell can read RSX's local memory. Memory bandwidth for the Cell itself is ~25 GB/sec. If the Cell ever wants to access the private RAM of the RSX (why ?) it *is* possible, but it's a lot more efficient to use the normal pathway through main memory...
Simon.
Re:memory speed? (Score:2)
I don't remember what read speed AGP had, but it was certainly asymetric wrt writing.
Re:memory speed? (Score:1, Informative)
Re:memory speed? (Score:2)
Re:memory speed? (Score:2)
Debunked in same article (Score:2)
25GB/sec, not 16MB/sec.
MOD PARENT TROLL (Score:1, Interesting)
Re:memory speed? (Score:2)
Sega Saturn Redux? (Score:4, Interesting)
Re: (Score:1)
Re:Sega Saturn Redux? (Score:3, Interesting)
First of all, their dream of a general 'octopiler' is pure fantasy. I have written massively parallel MPI and Shared Memory applications and can testify to their complexity. Mapping an arbitrary piece of code transparently to multiple processor is a extr
Re:Sega Saturn Redux? (Score:3, Interesting)
Your assumption would be wrong.
Re:Sega Saturn Redux? (Score:2, Interesting)
"The PPE which is capable of running a conventional operating system has control over the SPEs and can start, stop, interrupt and schedule processes running on the SPEs. To this end the PPE has additional instructions relating to control of the SPEs. De
Re:Sega Saturn Redux? (Score:2)
My pleasure.
Yeah, the PPE has to kickstart an SPE, but after that, you can treat the SPE as totally autonomous. They can fetch their own code and data, and what more do you need than that? You don't have to, you can manage them pretty much any way you want to. The PPE can halt an SPE, but that's a really inefficient way of doing things. Think of the size of the context you'd have to swap out to have the PPE control the threading on the SPEs.
Also I'd
Re:Sega Saturn Redux? (Score:2)
The Cyber architecture [wikipedia.org] had typically two main CPUs (60-bit), and 12-20 "Peripheral Processing Units", which were much lower capacity, 12-bit processors. The CPUs were started and stopped by the PPUs, and had no interrupt architecture. Control of the system was actually in the PPUs, they loaded programs into memory, set up memory mapping, handled context switches and system requests. PPUs themselves were implemented as shared hardware with multiple contexts, and control actually changed after each instruc
Re:Sega Saturn Redux? (Score:2)
The programming model for the SPEs is fairly straightforward. You bundle some code and some data into an APUlet, and upload it via the ring bus to the SPE. The SPE runs that code for some amount of time, and can communicate with the rest of of the chip either by sending messages over the ringbus (using a mailbox mechanism), or doing DMAs.
Re:Sega Saturn Redux? (Score:4, Interesting)
(Having the computer model itself to the problem reduces the complexity of programming and will make optimal use of the hardware. Having the program model itself after what the computer is tuned to do is merely an ugly hack and requires ugly compilers to specifically translate between the paradigms.)
The cell processor is designed around 1980s concepts of load-balancing while keeping to many of the rules of second-generation programming. Technology has moved on. That's not to say the cell is bad. It's a definite improvement over the 1960s concepts used in many modern CPUs. However, it is still 20 years behind the curve. C'mon, guys, this isn't the Space Shuttle, it's a microprocessor. There is no excuse for network and design technology to be so far beyond the best of the best that industrial giants are capable of doing.
Actually, it's worse than that. Modern multi-processor systems require specially-designed chipsets and become exponentially more expensive as you build them up. Single boards don't usually go beyond 16 processors. In comparison, people built single boards with 1024 Transputers without difficulty, with costs increasing linearly. So, in multi-processor architectures, we can't even match everything that could be done in the 1980s.
How does this affect those using the Cell? Well, that's simple. It doesn't offer enough of an added advantage and is different enough that coders will have difficulty making good use of it. That means that coders will have to be inefficient OR dedicated to that one chip, which has no guarantee of making any money for them. Coders won't bother, unless there is something out there that will make it a guaranteed success. I'm not seeing this killer demo.
Re:Sega Saturn Redux? (Score:2)
Functionally? Maybe. But considering the 20% yields, would you rather lose 1/8th of the chip, or the whole thing? Also, I imagine managing the cache for that on the fly would be a significantly larger headache then dividing it up in this more consistant way; associative lookup can take up a lot of realestate real quick.
Re:Sega Saturn Redux? (Score:2)
If you did this wit
Re:Sega Saturn Redux? (Score:2)
Unless, of course, you're using an AMD processor, which has Hypertransport links, and become linearly more expensive as you build them up. Give or take.
In order to get the best performance out of hammer and HT you have to link the processors more than in a line, but since it's a NUMA system you can simply link them end to end. It will not be an efficient architecture for mos
Re:Sega Saturn Redux? (Score:2)
It's not the same thing *at all*. CPUs are highly non-linear. 8 2-way processors are much simpler than 1 16-way processor. CPU structures tend to scale with the square of their width. A front-end capable of issuing 32-instructions per cycle
Re:Sega Saturn Redux? (Score:2)
Re:Sega Saturn Redux? (Score:2)
PS) Yes, I am a programmer. I think many discussions of Cell take it for granted that multithreaded prog
Saturn was less well planned (Score:3, Interesting)
Also sony is hard at work at dev kits which will make programming with the cell much easier. How well they succeed in making these dev kits will be the primary factor in how programming for the beast goes.
True, kind of (Score:2)
So what's new? (Score:1)
Not NINE processors, only EIGHT, since... (Score:5, Interesting)
Read more about the yield problems of the Cell chip here:
http://theinquirer.net/default.aspx?article=32978
Fabrication yield is estimated at only 10% to 20%, which is very low for the industry.
Re:Not NINE processors, only EIGHT, since... (Score:2, Interesting)
That's for a completely working package, the cell plus 8 SPEs. Because of the low yield of the "perfect" processors, PS3 will be using the ones with 7 working SPEs, since there are plenty of those. The IBM discussion linked by the inquirer shows that.
Yield is so low due not only to the complexity but also the size, if there are an average of 10 defects on a wafer and you can only fit 10 processors on a wafer (these numbers pulled totally out of my ass) then
Original Article Clarification Regarding Yields (Score:1)
Re:Not NINE processors, only EIGHT, since... (Score:2)
Re:Not NINE processors, only EIGHT, since... (Score:2)
Bla, bla, bla.... (Score:1)
Looks like a fun project... (Score:1)