Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

How long (in cycles) will the following loop take to execute per iteration? Assu

ID: 671586 • Letter: H

Question

How long (in cycles) will the following loop take to execute per iteration?

Assume the processor uses the 5 cycle memory pipeline from the text (FET/DEC/EXE/

MEM/WB) with perfect branch prediction and full forwarding. Further assume that the

processor can decode and issue 2 instructions/cycle, that there is a single add/sub ALU,

a single load/store ALU and a multiply ALU. The add/sub ALU also processes branches.

The load/store ALU requires 1 extra cycle in the MEM stage to perform a load or store.

The multiply ALU is fully pipelined and requires 3 cycles to complete a multiplication.

Also assume that the processor supports out-of-order execution with register renaming.

Be sure to show how you obtained your answer, at least including a schedule of stages

by instruction and cycle on the next page.

LOOP:

MUL R2,R1,R0

ADD R3,R2,R0

MUL R4,R5,R1

SUB R6,R7,R8

ADD R9,R6,R1

ST R6,0(R10)

ADDI R10,R10,#4

BNE R10,R11,LOOP

Explanation / Answer

L.D F0, 0(R1) ; F0 = array element

ADD.D F4, F0, F2 ; add scalar

S.D F4, 0(R1) ; store result

DADDUI R1, R1,# -8 ; decrement address pointer

BNE R1, R2, Loop ; branch if R1 != R2

NOP

Loop: L.D F0, 0(R1) ; F0 = array element

stall

ADD.D F4, F0, F2 ; add scalar

stall

stall

S.D F4, 0(R1) ; store result

DADDUI R1, R1,# -8 ; decrement address pointer

stall

BNE R1, R2, Loop ; branch if R1 != R2

stall

Loop: L.D F0, 0(R1)

stall

ADD.D F4, F0, F2

stall

stall

S.D F4, 0(R1)

DADDUI R1, R1,# -8

stall

BNE R1, R2, Loop

stall

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote