Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

a) Let us suppose that we have a regular 5 stage pipeline (with the associated f

ID: 3858890 • Letter: A

Question

a) Let us suppose that we have a regular 5 stage pipeline (with the associated forwarding units). Let us assume that branch decision hardware and jump happens in the second stage. Let us assume a static branch prediction of Not Taken. Now instead of the memory taking 1 cycle for all access, let us suppose we have a memory hierarchy that has the following parameters. If there is a cache hit then the data is obtained in 1 clock cycle. If there is a cache miss then the data takes an additional 2 clocks to come. Hence it takes 1 clock to tell if there is a hit or a miss. In case of a miss the pipeline has to stall for an additional 2 cycles for the data to be available. Now consider the following code Loop: 1w $s1 20($s0) Addi $t1 $s1 4 Addi $t2 $s1 5 Addi $s3 $s1 6 Addi $t4 $s1 7 Beq $s3 $s0 L1 Add $s3 $s0 $zero Jump L2 L1: Addi $s3 $s0 4 L2: Add $t5 $t2 $t1 Add $t6 $t3 $t4 Add $t1 $t5 $t6 Addi s0 s0 4 Jump LOOP Assuming s0 and s3 initially store 0. Assuming that we have a direct mapped cache with 16 words and block size is one word. How many average total stalls and flushes do we have for this code per iteration of the LOOP. Show your work............ b) Repeat the same computation for a direct mapped cache with block size of 2 words..........

Explanation / Answer

Answer :A

When we consider block size is one word.

As given $s1 and $s3 store 0. First load 20

lw $s1, 20($s0) means $s1 <- M[s0+ 20] but given s0= 0

but for direct addressing it would be $s1 <- M[20]

Addi $t1 $s1 4 means $t1 <- M[20]+4

Similarly if we execute following code. As we know forwarding may not work for data hazards from load instructions in that case we may need stall for two cycle.

Total number of

Stalls: 8

Flushes to clean cpu: 4

Answer :B

When we consider block size is two word.

if we execute following code from considering block size of two words we need less stalls and flushes.

Stalls: 5

Flushes: 3