6. Multicore and the Memory Wall [10 marks] Relative to each other, CPUs have fe
ID: 3872921 • Letter: 6
Question
6. Multicore and the Memory Wall [10 marks]
Relative to each other, CPUs have few threads and enormous caches, while GPUs
have many threads and tiny caches. Generally speaking, we need to think
carefully about any hardware architecture in which many hardware threads
share a cache (usually, but not always, the last-level cache). GPUs are the
extreme example of such an architecture.
Suppose threads are cache friendly in the sense that each portion of a thread
accesses a small subset of memory, and that this subset changes relatively
slowly. But threads are normally interleaved with some frequency, rather than
being allowed to run to completion. The technical term for this slowly
changing subset is the thread's _working set_.
Now suppose that the working set of a thread is roughly the same size as the
shared cache. Further suppose that the hardware schedules threads much more
rapidly than the speed with which their working sets evolve.
As a thought experiment, in which case is a thread more likely to find its data
in cache:
Case i: We schedule many threads rapidly for short intervals, as described
above.
Case ii: We schedule one thread for a relatively long time so that it owns the
cache for an interval roughly equal to one phase of the evolution of its working
set.
Explain.
Explanation / Answer
After nearly 40 years wandering in the silicon wilderness searching for the promised land of CPU performance and power, computer deity, Berkeley's Dr. David Patterson handed down his famous "Three Walls."1 They were not etched in stone, but they may as well have been. These three immovable impediments defined the end times of increased computing performance. They would prevent computer users from ever reaching the land of milk and honey and 10 GHz Pentiums. There may be a hole in the Walls, but for now we know them as:
"Power Wall + Memory Wall + ILP Wall = Brick Wall"
- The Power Wall means faster computers get really hot.
- The Memory Wall means 1000 pins on a CPU package is way too many.
- ILP Wall means a deeper instruction pipeline really means digging a deeper power hole. (ILP stands for instruction level parallelism.)
Taken together, they mean that computers will stop getting faster. Furthermore, if an engineer optimizes one wall he aggravates the other two. That is exactly what Intel did.
Intel's Tejas hits the walls - hard
Intel engineers went pedal to the metal straight into the Power Wall, backed up, gunned the gas, and went hard into the Memory Wall.
The industry was stunned when Intel cancelled not one but two premier processor designs in May of 2004. Intel's Tejas CPU, Sanskrit for fire, dissipated a stupendous 150 watts at 2.8 GHz, more than Hasbro's Easy Bake Oven.
The Tejas had been projected to run 7 GHz. It never did. When microprocessors get too hot, they quit working and sometimes blow up.
The Memory Hierarchy And The Memory Wall
As far back as the 1980s, the term memory wall was coined to describe the growing disparity between CPU clock rates and off-chip memory and disk drive I/O rates. An example from the GPU world clearly illustrates the memory wall.
In 2005, a leading-edge GPU had 192 floating-point cores, while today’s leading-edge GPU contains 512 floating-point cores. In the intervening six years, the primary GPU I/O pipe remained the same. The GPU of six years ago utilized 16 lanes of PCI Express Gen2, and so does today’s GPU. As a result, per-core I/O rates for GPUs have dropped by a factor of 2.7 since 2005.
On-chip cache memory, which is 10 to 100 times faster than off-chip DRAM, was supposed to knock down the memory wall. But cache memories have their own set of problems. The L1 and L2 caches found on ARM-based application processors utilize more than half of the chip’s silicon area. As such, a significant percentage of processor power is consumed by cache memory, not by computations.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.