Consider the following code and answer the following questions. Note that F2 reg
ID: 3792890 • Letter: C
Question
Consider the following code and answer the following questions. Note that F2 register holds a scalar constant that cannot be changed for the computation (see MUL.D instruction) Enable forwarding (check under the Configure tab). Run the code. How many stalls do you see? Can you identify where these stalls occur (the pair of instructions) that cause this stall. Execute the code by enabling Enable Branch Target Buffer (check under the Configure tab). How many stalls do you see? How many stalls do you see and what exactly does the Enable Branch Target Buffer to? What is the CPI and what is the speedup when compared to (a)? Execute the code by enabling Enable Delay Slot (check under the Configure tab). You will need to put one instruction to be executed, else HALT instruction will stop the code from executing. What is the CPI and the speedup compared to (a). Which scheme is better, branch target buffer or delay slot? Re-arrange the loop without unrolling. You can move individual instructions, however the output of this dummy loop should be exactly the same i.e. adjust the offset for memory instructions (load/store). Can you reduce the stalls for this code? What is the new CPI and the speedup when compared to (a)? Now, transform the loop by unrolling the loop, reschedule the instructions, enable delay slot or branch target buffer to completely minimize the stalls. What is the CPI and what is the speedup when compared to (a)?Explanation / Answer
a)
Enable forwarding mode the resultant operand values are forwarded to next instruciton and the the values are writeback.
Using Forwarding mechanism we reduce the number of stall cycles occured due to data hazards.
In the main fuction:
There are no stall cycles because no depedency among the instructions.
First instruction value is stored in R3;
Second instruction value is stored in R1;
Third Instruction value is store in R2;
in the LOOP:
operand is used in based index addressing mode. The value of operand is in the location of value stored in R0 register.
second instruction involes one stall cycle. F0 and F0 in first instrction.
Forth instruction causes one stall cycle. F4 and F4
Fifth instruction causes one stall cycle F0 and F0.
Total 4 stall cycles using forwarding method....
CPI is one clock..
b)
To reduce the branch penalty further, we need to identify a branch and its predicted target in the first stage by using a branch target buffer.
The branch target buffer is a true cache, the full PC value must be compared to validate that this is a branch instruction before taking any action.
there are same stall cycles for this question also because braching condtion is not mentioned in the instruction
c)
So from my understanding of delay slots, they occur when a branch instruction is called and the next instruction following the branch also gets loaded from memory. What is the point of this? Wouldn't you expect the code after a branch not to run in case the branch is taken? Is it to save time in case the branch isnt taken?
I am looking at a pipeline diagram and it seems the instruction after branch is getting carried out anyway.
speed up is doubled if every two consectutive instructions are taken place.
Delay slot scheme is better because of increase in speed up factor...
d)
yes,we can reduce the stall cycles by replacing independent instructions between dependent instructions.
in loop instruction 1,2,3,4 causes two stall cycles so place 3rd instruction inbetween the 1st and 3rd instructions so here we got zero stall cycles...
4 th and 5 th instruction causes stall cycle place 6th instruction inbetween 4th and 5th instruction and it reduces that stall cycle...
speed up is increased because here we reduce the no of cycle needed by instruction...
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.