Consider a pipelined RISC system for which the relevant data for this system is
ID: 3770903 • Letter: C
Question
Consider a pipelined RISC system for which the relevant data for this system is given below:
• System has separate I-cache and D-cache
• Both caches have a hit rate of 100%
• The default latency for each cache is one cycle
• I-cache can supply one instruction per cycle to the processor
• Only load/store instructions access the D-cache
• Assume that the dynamic instruction mix for an application is as follows:
ALU 40% (register-register or ‘R-type’ only)
Loads 25%
Stores 15%
Branches 20%
• The dynamic mix of branches has the following statistics:
Unconditional branches (always taken) 33.3%
Conditional branches (taken) 33.3%
Conditional branches (not taken) 33.3%
• The statistics pertaining to compiler load scheduling (statically moving loads back in
order to reduce stall cycles or delay slots) is as shown below:
Loads that cannot be scheduled or moved back (no delay slot filled) 25%
Loads that can be moved back by up to two instructions (one or two) 65%
4
Loads that can be moved back by one instruction (one delay slot filled) 10%
• The statistics pertaining for compiler branch scheduling is as follows:
Unconditional branches 100% schedulable (one delay slot filled)
Conditional branches 50% schedulable (one delay slot filled)
Initially, it is assumed that there is no ALU penalty and that the load and branch penalties are
both two cycles (two for load and two for branch).
c.Now assume that compiler scheduling is introduced. The relevant statistics are given in
the initial text of the question. What will be the resultant CPI now?
d. Now assume that PC-relative addressing is employed by introducing a separate adder for
generating the branch target address so that branch target address can be calculated
without accessing the register file whenever possible. Assume that 90% of all branches
can be made PC-relative. Consider all the effects of this change and calculate the
resultant CPI
Explanation / Answer
Penalty Reduction via Forwarding Paths
So far the only mechanism available for dealing with hazard resolution is:
Stalling the dependent trailing instruction
nsuring that the writing and reading of the hazard register are done in their normal sequential order
More aggressive techniques are available
The incorporation of forwarding paths in the pipeline
To help reduce the penalty cycles incurred by pipeline hazards
Assume the leading instruction i is the instruction on which the trailing instruction j depends
lFor RAW hazards, instruction j needs the result of instruction i for its operand
If the leading instruction i is an ALU instruction, the result needed by instruction j is actually available when i completes the ALU stage
The operand needed by instruction j is actually available at the output of the ALU stage when instruction i exits the ALU stage
j needs not wait two more cycles for i to exit the WB stage
This reduces the worst-case penalty down to just one cycle
When the dependent instruction is instruction i + 1, i.e., j = i + 1
When instruction i is in the ALU stage, instruction i + 1 will be in the RD stage
When instruction i advances to the MEM stage, instruction i + 1 must be held back at the RD stage via stalling the earlier stages of the pipeline
Implementation of Pipeline Interlock
The resolving of pipeline hazards via hardware mechanisms
Pipeline interlock hardware must detect all pipeline hazards and ensure that all the dependences are satisfied
Pipeline interlock can involve stalling certain stages of the pipeline as well as controlling the forwarding of data via the forwarding paths
With the addition of forwarding paths, the scalar pipeline is no longer a simple sequence of pipeline stages
Data flowing is not from the first stage to the last stage
Consider the physical implementation of the logical diagram
The trailing instruction j is currently in the RD stage attempting to read its two register operands
The first two comparators (to the left) are checking for possible RAW dependences between instruction j and instruction j - 1, which is in the ALU stage
These two comparators are comparing the two source register specifiers of j with the destination register specifier of j – 1
When the register file is designed to perform first a write and then a read in each cycle
A dependence between instructions i and i + 3 is automatically satisfied when they traverse the WB and RD stages
The output of the first pair of comparators is used along with a signal from the ID stage
Indicating that the leading instruction is a load to produce a control signal for stalling the first three stages of the pipeline for one cycle
A dependence is detected between instructions i and i + 1, where instruction i is a load
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.