Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

6. Multicore and the Memory Wall [10 marks] Relative to each other, CPUs have fe

ID: 3872921 • Letter: 6

Question

6. Multicore and the Memory Wall [10 marks]

Relative to each other, CPUs have few threads and enormous caches, while GPUs

have many threads and tiny caches. Generally speaking, we need to think

carefully about any hardware architecture in which many hardware threads

share a cache (usually, but not always, the last-level cache). GPUs are the

extreme example of such an architecture.

Suppose threads are cache friendly in the sense that each portion of a thread

accesses a small subset of memory, and that this subset changes relatively

slowly. But threads are normally interleaved with some frequency, rather than

being allowed to run to completion. The technical term for this slowly

changing subset is the thread's _working set_.

Now suppose that the working set of a thread is roughly the same size as the

shared cache. Further suppose that the hardware schedules threads much more

rapidly than the speed with which their working sets evolve.

As a thought experiment, in which case is a thread more likely to find its data

in cache:

Case i: We schedule many threads rapidly for short intervals, as described

above.

Case ii: We schedule one thread for a relatively long time so that it owns the

cache for an interval roughly equal to one phase of the evolution of its working

set.

Explain.

Explanation / Answer

After nearly 40 years wandering in the silicon wilderness searching for the promised land of CPU performance and power, computer deity, Berkeley's Dr. David Patterson handed down his famous "Three Walls."1 They were not etched in stone, but they may as well have been. These three immovable impediments defined the end times of increased computing performance. They would prevent computer users from ever reaching the land of milk and honey and 10 GHz Pentiums. There may be a hole in the Walls, but for now we know them as:

"Power Wall + Memory Wall + ILP Wall = Brick Wall"

- The Power Wall means faster computers get really hot.
- The Memory Wall means 1000 pins on a CPU package is way too many.
- ILP Wall means a deeper instruction pipeline really means digging a deeper power hole. (ILP stands for instruction level parallelism.)

Taken together, they mean that computers will stop getting faster. Furthermore, if an engineer optimizes one wall he aggravates the other two. That is exactly what Intel did.

Intel's Tejas hits the walls - hard
Intel engineers went pedal to the metal straight into the Power Wall, backed up, gunned the gas, and went hard into the Memory Wall.

The industry was stunned when Intel cancelled not one but two premier processor designs in May of 2004. Intel's Tejas CPU, Sanskrit for fire, dissipated a stupendous 150 watts at 2.8 GHz, more than Hasbro's Easy Bake Oven.

The Tejas had been projected to run 7 GHz. It never did. When microprocessors get too hot, they quit working and sometimes blow up.

The Memory Hierarchy And The Memory Wall

As far back as the 1980s, the term memory wall was coined to describe the growing disparity between CPU clock rates and off-chip memory and disk drive I/O rates. An example from the GPU world clearly illustrates the memory wall.

In 2005, a leading-edge GPU had 192 floating-point cores, while today’s leading-edge GPU contains 512 floating-point cores. In the intervening six years, the primary GPU I/O pipe remained the same. The GPU of six years ago utilized 16 lanes of PCI Express Gen2, and so does today’s GPU. As a result, per-core I/O rates for GPUs have dropped by a factor of 2.7 since 2005.

On-chip cache memory, which is 10 to 100 times faster than off-chip DRAM, was supposed to knock down the memory wall. But cache memories have their own set of problems. The L1 and L2 caches found on ARM-based application processors utilize more than half of the chip’s silicon area. As such, a significant percentage of processor power is consumed by cache memory, not by computations.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote