Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider a pipelined RISC system for which the relevant data for this system is

ID: 3770903 • Letter: C

Question

Consider a pipelined RISC system for which the relevant data for this system is given below:
• System has separate I-cache and D-cache
• Both caches have a hit rate of 100%
• The default latency for each cache is one cycle
• I-cache can supply one instruction per cycle to the processor
• Only load/store instructions access the D-cache
• Assume that the dynamic instruction mix for an application is as follows:
ALU 40% (register-register or ‘R-type’ only)
Loads 25%
Stores 15%
Branches 20%
• The dynamic mix of branches has the following statistics:
Unconditional branches (always taken) 33.3%
Conditional branches (taken) 33.3%
Conditional branches (not taken) 33.3%
• The statistics pertaining to compiler load scheduling (statically moving loads back in
order to reduce stall cycles or delay slots) is as shown below:
Loads that cannot be scheduled or moved back (no delay slot filled) 25%
Loads that can be moved back by up to two instructions (one or two) 65%
4
Loads that can be moved back by one instruction (one delay slot filled) 10%
• The statistics pertaining for compiler branch scheduling is as follows:
Unconditional branches 100% schedulable (one delay slot filled)
Conditional branches 50% schedulable (one delay slot filled)
Initially, it is assumed that there is no ALU penalty and that the load and branch penalties are
both two cycles (two for load and two for branch).
c.Now assume that compiler scheduling is introduced. The relevant statistics are given in
the initial text of the question. What will be the resultant CPI now?
d. Now assume that PC-relative addressing is employed by introducing a separate adder for
generating the branch target address so that branch target address can be calculated
without accessing the register file whenever possible. Assume that 90% of all branches
can be made PC-relative. Consider all the effects of this change and calculate the
resultant CPI

Explanation / Answer

Penalty Reduction via Forwarding Paths

So far the only mechanism available for dealing with hazard resolution is:

Stalling the dependent trailing instruction

nsuring that the writing and reading of the hazard register are done in their normal sequential order

More aggressive techniques are available

The incorporation of forwarding paths in the pipeline

To help reduce the penalty cycles incurred by pipeline hazards

Assume the leading instruction i is the instruction on which the trailing instruction j depends

lFor RAW hazards, instruction j needs the result of instruction i for its operand

If the leading instruction i is an ALU instruction, the result needed by instruction j is actually available when i completes the ALU stage

The operand needed by instruction j is actually available at the output of the ALU stage when instruction i exits the ALU stage

j needs not wait two more cycles for i to exit the WB stage

This reduces the worst-case penalty down to just one cycle

When the dependent instruction is instruction i + 1, i.e., j = i + 1

When instruction i is in the ALU stage, instruction i + 1 will be in the RD stage

When instruction i advances to the MEM stage, instruction i + 1 must be held back at the RD stage via stalling the earlier stages of the pipeline

Implementation of Pipeline Interlock

The resolving of pipeline hazards via hardware mechanisms

Pipeline interlock hardware must detect all pipeline hazards and ensure that all the dependences are satisfied

Pipeline interlock can involve stalling certain stages of the pipeline as well as controlling the forwarding of data via the forwarding paths

With the addition of forwarding paths, the scalar pipeline is no longer a simple sequence of pipeline stages

Data flowing is not from the first stage to the last stage

Consider the physical implementation of the logical diagram

The trailing instruction j is currently in the RD stage attempting to read its two register operands

The first two comparators (to the left) are checking for possible RAW dependences between instruction j and instruction j - 1, which is in the ALU stage

These two comparators are comparing the two source register specifiers of j with the destination register specifier of j – 1

When the register file is designed to perform first a write and then a read in each cycle

A dependence between instructions i and i + 3 is automatically satisfied when they traverse the WB and RD stages

The output of the first pair of comparators is used along with a signal from the ID stage

Indicating that the leading instruction is a load to produce a control signal for stalling the first three stages of the pipeline for one cycle

A dependence is detected between instructions i and i + 1, where instruction i is a load

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote