Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1. When processor designers consider a possible improvement to the processor dat

ID: 3911488 • Letter: 1

Question

1. When processor designers consider a possible improvement to the processor datapath, the decision usually depends on the cost/performance trade-off. In the following three problems, assume that we are starting with a datapath from COD Figure 4.2 (The basic implementation of the MIPS subset), where I-Mem, Add, Mux, ALU, Regs, D-Mem, and Control blocks have latencies of 400 ps, 100 ps, 30 ps, 120 ps, 200 ps, 350 ps, and 100 ps, respectively, and costs of 1000, 30, 10, 100, 200, 2000, and 500, respectively. Consider the addition of a multiplier to the ALU. This addition will add 300 ps to the latency of the ALU and will add a cost of 600 to the ALU. The result will be 5% fewer instructions executed since we will no longer need to emulate the MUL instruction. What is the clock cycle time with and without this improvement? What is the speedup achieved by adding this improvement? Compare the cost/performance ratio with and without this improvement.

Explanation / Answer

Latency Cost

I-Mem 400 1000

Add 100 30

Mux 30 10

ALU 120 100

Regs 200 200

D-Mem 350 2000

Control Blocks 100 500

a) Cycle time without improvement

Critical path determines clock cycle time. This is given by,

I-Mem (reads instruction), Regs (takes longer than Control), Mux (select ALU input), ALU, Data Memory, and Mux (select value from memory to be written into Registers)

The latency of this path is 400 ps + 200 ps + 30 ps + 120 ps + 350 ps + 30 ps = 1130 ps

Cycle time with improvement

ALU is on the critical path ( addition of a multiplier to the ALU adds 300 ps to the latency)

The latency of this path is 1130ps + 300 ps = 1430ps

b) The cycle time is 1430 instead of 1130.

result will be 5% fewer instructions executed since we will no longer need to emulate the MUL instruction

Speedup = (1/0.95)*(1130/1430) = 0.83

=> improved processor runs slower than the old processor.

c) The cost will be equal to the total cost of all components (cost of I-Mem, Regs, Control, ALU, D-Mem, 2 Add units and 3 Mux units)

Total cost = 1000 + 200 + 500 + 100 + 2000 + 2*30 + 3*10 = 3890.

The addition of a multiplier to the ALU adds a cost of 600 to the ALU.

=> New Cost = 3890 + 600 = 4490

=> Relative Cost: 4490/3890 = 1.15

Cost/Performance ratio = 1.15/0.83 = 1.39

The Cost/Performance ratio tells that we are paying more for the worse performance.