Translate the above code (bottom using nested loops) using our DLX vector instru
ID: 3862323 • Letter: T
Question
Translate the above code (bottom using nested loops) using our DLX vector instruction set. Assume:
Vector registers of length 8
Load unit has a startup of L clocks
Adder unit has a startup of A clocks
Multiplier unit has a startup of M clocks
For vectors of length N, compute the number of clock cycles to execute the inner loop (the vector operations) both for normal execution and then for allowing changing of loads/stores/addition/ multiplication. How much speedup do we achieve with chaining?
low VL (n MVL); find odd-size piece using modulo op for (j 0; j (n/MVL) j j+1) /*outer loop*/ for (i low; i (low+VL); i i+1) runs for length VL*/ Y[i] a x[i] Y[i] /*main operation*/ low low VL; start of next vector*/ VL MVL; reset the length to maximum vector lengthExplanation / Answer
• LOOP LD R4, 0(R1) 11
• LD R5, 0(R2) 11
• ADD R6,R4,R5 4
• SR R6, R6, 1 1
• ST R6, 0(R3) 11
• ADDI R1,R1,4 1
• ADDI R2, R2, 4 1
• ADDI R3, R3, 4 1
• ADDI R0, R0, 1 1
• BEQZ R0, LOOP 2
• Chaining: No need to wait until the vector register is loaded, you can start after the first element is ready.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.