You will be guided through the complete hardware design of a register file and w
ID: 2268661 • Letter: Y
Question
You will be guided through the complete hardware design of a register file and will investigate of some tradeoffs of using this type of device. Please answer the following questions about register files and other memory storage units A. Usually, register files are used when doing certain kinds of look-up operations in a CPU because they have very few gate delays. However, the memory hierarchies on a computer are implemented with SRAM and DRAM rather than with registers. Why are registers not considered to be as scalable as SRAM and DRAM? Explain the meaning of the words "static" and "dynamic" in the context of SRAM and DRAM. Consider the dual read-port, single write-port 32 × 32 register file seen in class. Construct a full gate-level circuit implementation of such a device. Use a hierarchical approach by building up the device from smaller building blocks. You only have to show the implementation of each building block once before using it as a black bo:x. B.Explanation / Answer
A) why registers are not considered to be as scalable as SRAM and DRAM
In real, there are tiny implementations that does not separate registers from memory. They can expose it, for example, in the way they have 512 bytes of RAM, and first 64 of them are exposed as 32 16-bit registers and in the same time accessible as addressable RAM. Or, another example, MosTek 6502 "zero page" (RAM range 0-255, accessed used 1-byte address) was a poor substitution for registers, due to small amount of real registers in CPU. But, this is poorly scalable to larger setups.
The advantage of registers are following:
Additionally, there are registers that could not be implemented in separate block RAM because access to RAM needs their change. I mean the "execution phase" register in the simplest CPU designs, which takes values like "instruction extracting phase", "instruction decoding phase", "ALU phase", "data writing phase" and so on, and this register equivalents in more complicated (pipeline, out-of-order) designs; also different buffer registers on bus access, and so on. But, such registers are not visible to programmer, so you did likely not mean them.
Registers are placed inside the CPU, right to the ALU so signals can travel almost right away. They're also the fastest memory type but they take significant space so we can have only a limited number of them. Increasing the number of registers increases
There's many reasons you don't just have a huge number of registers:
They're highly linked to most pipeline stages. For starters, you need to track their lifetime, and forward results back to previous stages. The complexity gets intractable very quickly, and the number of wires (literally) involved grows at the same rate. It's expensive on area, which ultimately means it's expensive on power, price and performance after a certain point.
It takes up instruction encoding space. 16 registers takes up 4 bits for source and destination, and another 4 if you have 3-operand instructions (e.g ARM). That's an awful lot of instruction set encoding space taken up just to specify the register. This eventually impacts decoding, code size and again complexity.
There's better ways to achieve the same result...
These days we really do have lots of registers - they're just not explicitly programmed. We have "register renaming". While you only access a small set (8-32 registers), they're actually backed by a much larger set (e.g 64-256). The CPU then tracks the visibility of each register, and allocates them to the renamed set. For example, you can load, modify, then store to a register many times in a row, and have each of these operations actually performed independently depending on cache misses etc. In ARM:
ldr r0, [r4]
add r0, r0, #1
str r0, [r4]
ldr r0, [r5]
add r0, r0, #1
str r0, [r5]
Cortex A9 cores do register renaming, so the first load to "r0" actually goes to a renamed virtual register - let's call it "v0". The load, increment and store happen on "v0". Meanwhile, we also perform a load/modify/store to r0 again, but that'll get renamed to "v1" because this is an entirely independent sequence using r0. Let's say the load from the pointer in "r4" stalled due to a cache miss. That's ok - we don't need to wait for "r0" to be ready. Because it's renamed, we can run the next sequence with "v1" (also mapped to r0) - and perhaps that's a cache hit and we just had a huge performance win.
ldr v0, [v2]
add v0, v0, #1
str v0, [v2]
ldr v1, [v3]
add v1, v1, #1
str v1, [v3]
I think x86 is up to a gigantic number of renamed registers these days (ballpark 256). That would mean having 8 bits times 2 for every instruction just to say what the source and destination is. It would massively increase the number of wires needed across the core, and its size. So there's a sweet spot around 16-32 registers which most designers have settled for, and for out-of-order CPU designs, register renaming is the way to mitigate it.
Edit: The importance of out-of-order execution and register renaming on this. Once you have OOO, the number of registers doesn't matter so much, because they're just "temporary tags" and get renamed to the much larger virtual register set. You don't want the number to be too small, because it gets difficult to write small code sequences. This is a problem for x86-32, because the limited 8 registers means a lot of temporaries end up going through the stack, and the core needs extra logic to forward reads/writes to memory. If you don't have OOO, you're usually talking about a small core, in which case a large register set is a poor cost/performance benefit.
So there's a natural sweet spot for register bank size which maxes out at about 32 architected registers for most classes of CPU. x86-32 has 8 registers and it's definitely too small. ARM went with 16 registers and it's a good compromise. 32 registers is slightly too many if anything - you end up not needing the last 10 or so.
None of this touches on the extra registers you get for SSE and other vector floating point coprocessors. Those make sense as an extra set because they run independently of the integer core, and don't grow the CPU's complexity exponentially.
B)
There are two types of Random Access Memory or RAM, each has its own advantages and disadvantages compared to the other. SRAM (Static RAM) and DRAM (Dynamic RAM) holds data but in a different ways. DRAM requires the data to be refreshed periodically in order to retain the data. SRAM does not need to be refreshed as the transistors inside would continue to hold the data as long as the power supply is not cut off. This behavior leads to a few advantages, not the least of which is the much faster speed that data can be written and read.
The additional circuitry and timing needed to introduce the refresh creates some complications that makes DRAM memory slower and less desirable than SRAM. One complication is the much higher power used by DRAM memory, this difference is very significant in battery powered devices. SRAM modules are also much simpler compared to DRAM, which makes it easier for most people to create an interface to access the memory. This makes it easier to work with for hobbyists and even for prototyping.
Structurally, SRAM needs a lot more transistors in order to store a certain amount of memory. A DRAM module only needs a transistor and a capacitor for every bit of data where SRAM needs 6 transistors. Because the number of transistors in a memory module determine its capacity, a DRAM module can have almost 6 times more capacity with a similar transistor count to an SRAM module. This ultimately boils down to price, which is what most buyers are really concerned with.
Because of its lower price, DRAM has become the mainstream in computer main memory despite being slower and more power hungry compared to SRAM. SRAM memory is still used in a lot of devices where speed is more crucial than capacity. The most prominent use of SRAM is in the cache memory of processors where speed is very essential, and the low power consumption translates to less heat that needs to be dissipated. Even hard drives, optical drives, and other devices that needs cache memory or buffers use SRAM modules.
SRAM is static while DRAM is dynamic
2. SRAM is faster compared to DRAM
3. SRAM consumes less power than DRAM
4. SRAM uses more transistors per bit of memory compared to DRAM
5. SRAM is more expensive than DRAM
6. Cheaper DRAM is used in main memory while SRAM is commonly used in cache memory
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.