Consider the following code and answer the following questions. Note that F2 reg

ID: 3861081 • Letter: C

Question

Consider the following code and answer the following questions. Note that F2 register holds a scalar constant that cannot be changed for the computation (see MUL.D instruction)

.data

.text

main:

DADDI R3,R0,8

DADDI R1,R0,1024

DADDI R2,R0,1024

Loop: L.D F0,0(R1)

MUL.D F0,F0,F2

L.D F4,0(R2)

ADD.D F0,F0,F4

S.D F0,0(R2)

DSUB R1,R1,R3

DSUB R2,R2,R3

BNEZ R1,Loop

HALT

(a) Enable forwarding (check under the Configure tab). Run the code. How many stalls do you see? Can you identify where these stalls occur (the pair of instructions) that cause this stall. Hint: Run in Single Cycle mode using F7. What is the CPI?

(b) Execute the code by enabling Enable Branch Target Buffer (check under the Configure tab). How many stalls do you see? How many stalls do you see and what exactly does the Enable Branch Target Buffer do? What is the CPI and what is the speedup when compared to (a)?

(c) Execute the code by enabling Enable Delay Slot (check under the Configure tab). You will need to put one instruction to be executed, else HALT instruction will stop the code from executing. What is the CPI and the speedup compared to (a). Which scheme is better, branch target buffer or delay slot?

(d) Re-arrange the loop without unrolling. You can move individual instructions, however the output of this dummy loop should be exactly the same i.e. adjust the offset for memory instructions (load/store). Can you reduce the stalls for this code? What is the new CPI and the speedup when compared to (a)?

(e) Now, transform the loop by unrolling the loop, reschedule the instructions, enable delay slot or branch target buffer to completely minimize the stalls. What is the CPI and what is the speedup when compared to (a)?

Consider the following code and answer the following questions. Note that F2 register holds a scalar constant that cannot be changed for the computation (see MUL.D instruction) data text main DADD R3,R0, 8 DADD R1. RO, 1024 DADDI R2,R0, 1024 Loop: L. D 0 0 R1 MUL.D F0, F2 L.D 4, 0 (R2 ADD.D S.D 0 0 (R2 DSUB R1 R1, R3 DSUB R2 R2,R3 BNEZ R1. Loo HALT (a) Enable forwarding (check under the Configure tab). Run the code. How many stalls do you see? Can you identify where these stalls occur (the pair of instructions that cause this stall. Hint: Run in Single Cycle mode using F7. What is the CPI? (b) Execute the code by enabling Enable Branch Target Buffer (check under the Configure tab). How many stalls d you see? How many stalls d you see and what exactly does the Enable Branch Target Buffer do? What is the CPI and what is the speedup when compared to (a)? Execute the code by enabling Enable Delay Slot (check under the Configure tab). (c) need to put one instruction to be executed, else HALT instruction will stop the code from executing. What is the CPI and the speedup compared to (a). Which scheme is better, branch target buffer or delay slot? (d) Re-arrange the loop without unrolling. You can move individual instructions, however the output of this dummy loop should be exactly the same i.e. adjust the offset for memory instructions (load/store). Can you reduce the stalls for this code? What is the new CPI and the speedup when compared to (a)? (e) Now, transform the loop by unrolling the loop, reschedule the instructions, enable delay slot or branch target buffer to completely minimize the stalls. What is the CPI and what is the speedup when compared to (a)?

Explanation / Answer

Enable forwarding mode the resultant operand values are forwarded to next instruciton and the the values are writeback.

Using Forwarding mechanism we reduce the number of stall cycles occured due to data hazards.

In the main fuction:

There are no stall cycles because no depedency among the instructions.

First instruction value is stored in R3;

Second instruction value is stored in R1;

Third Instruction value is store in R2;

in the LOOP:

operand is used in based index addressing mode. The value of operand is in the location of value stored in R0 register.

second instruction involes one stall cycle. F0 and F0 in first instrction.

Forth instruction causes one stall cycle. F4 and F4

Fifth instruction causes one stall cycle F0 and F0.

Total 4 stall cycles using forwarding method....

CPI is one clock..

To reduce the branch penalty further, we need to identify a branch and its predicted target in the first stage by using a branch target buffer.

The branch target buffer is a true cache, the full PC value must be compared to validate that this is a branch instruction before taking any action.

there are same stall cycles for this question also because braching condtion is not mentioned in the instruction

So from my understanding of delay slots, they occur when a branch instruction is called and the next instruction following the branch also gets loaded from memory. What is the point of this? Wouldn't you expect the code after a branch not to run in case the branch is taken? Is it to save time in case the branch isnt taken?

I am looking at a pipeline diagram and it seems the instruction after branch is getting carried out anyway.

speed up is doubled if every two consectutive instructions are taken place.

Delay slot scheme is better because of increase in speed up factor...

yes,we can reduce the stall cycles by replacing independent instructions between dependent instructions.

in loop instruction 1,2,3,4 causes two stall cycles so place 3rd instruction inbetween the 1st and 3rd instructions so here we got zero stall cycles...

4 th and 5 th instruction causes stall cycle place 6th instruction inbetween 4th and 5th instruction and it reduces that stall cycle...

speed up is increased because here we reduce the no of cycle needed by instruction...

Navigate

Consider the following code and answer the following questions. Note that F2 reg

Consider the following code and show how you could use exceptions to terminate t

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Consider the following code and answer the following questions. Note that F2 reg

Question

Explanation / Answer

Related Questions

Navigate