Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Given the following single-cycle \"Minimal MIPS\" datapath, answer the questions

ID: 3781732 • Letter: G

Question

Given the following single-cycle "Minimal MIPS" datapath, answer the questions that follow: Diagram or explain the hardware changes needed to support the following instructions. Be sure to account for both the data hardware and the control signal changes needed. jr (jump to register, copies the value of a register into PC) jal (jump and link; copies the value of PC+4 into a register, then jumps to a new address) Now we want to convert our datapath to a multi-cycle (but NOT pipelined) version. For both those instructions (jr and jal), explain what, if anything, they would do during the EX, MEM, and WB phases of execution.

Explanation / Answer


CPU time
X,P
= Instructions executed
P
* CPI
X,P
* Clock cycle time
X

Instructions executed:

We are not interested in the

static instruction count

, or how many

lines of code are in a program.

Instead we care about the

dynamic instruction count

, or how many

instructions are actually executed when the program runs.

There are three lines of code below, but the number of instructions

executed would be 2001.

li

$a0, 1000

Ostrich:

sub

$a0, $a0, 1

bne

$a0, $0, Ostrich

Instructions Executed

The average number of clock cycles per instruction, or

CPI

, is a function

of the machine

and

program.

The CPI depends on the actual instructions appearing in the program

a floating

-

point intensive application might have a higher CPI than an

integer

-

based program.

It also depends on the CPU implementation. For example, a Pentium

can execute the same instructions as an older 80486, but faster.

So far we assumed each instruction took one cycle, so we had CPI = 1.

The CPI can be >1 due to memory stalls and slow instructions.

The CPI can be

<

1 on machines that execute more than 1 instruction

per cycle (superscalar).

CPI

One cycle is the minimum time it takes the CPU to do any work.

The

clock cycle time

or clock period is just the length of a cycle.

The

clock rate

, or frequency, is the reciprocal of the cycle time.

Generally, a higher frequency is better.

Some examples illustrate some typical frequencies.

A 500MHz processor has a cycle time of 2ns.

A 2GHz (2000MHz) CPU has a cycle time of just 0.5ns (500ps).

Clock cycle time

CPU time

X,P

= Instructions executed

P

* CPI

X,P

* Clock cycle time

X

The easiest way to remember this is match up the units:

Make things faster by making any component smaller!!

Often easy to reduce one component by increasing another

Execution time, again

Seconds

=

Instructions

*

Clock cycles

*

Seconds

Program

Program

Instructions

Clock cycle

Program

Compiler

ISA

Organization

Technology

Instruction

Executed

CPI

Clock Cycle

TIme

Let’s compare the performances two x86

-

based processors.

An 800MHz AMD Duron, with a CPI of 1.2 for an MP3 compressor.

A 1GHz Pentium III with a CPI of 1.5 for the same program.

Compatible processors implement identical instruction sets and will use

the same executable files, with the same number of instructions.

But they implement the ISA differently, which leads to different CPIs.

CPU time

AMD,P

= Instructions

P

* CPI

AMD,P

* Cycle time

AMD

=

=

CPU time

P3,P

= Instructions

P

* CPI

P3,P

* Cycle time

P3

=

=

Example 1: ISA

-

compatible processors

12

10100

I [15

-

11]

How the add goes through the datapath

Read

address

Instruction

memory

Instruction

[31

-

0]

Read

address

Write

address

Write

data

Data

memory

Read

data

MemWrite

MemRead

1

M

u

x

0

MemToReg

4

Shift

left 2

PC

Add

Add

0

M

u

x

1

PCSrc

Sign

extend

0

M

u

x

1

ALUSrc

Result

Zero

ALU

ALUOp

I [15

-

0]

I [25

-

21]

01001

I [20

-

16]

01010

0

M

u

x

1

RegDst

Read

register 1

Read

register 2

Write

register

Write

data

Read

data 2

Read

data 1

Registers

RegWrite

00...01

00...10

00...11

PC+4

It gets worse...

We’ve made

very

optimistic assumptions about memory latency:

Main memory accesses on modern machines is >50ns.

For comparison, an ALU on an AMD Opteron takes ~0.3ns.

Our worst case cycle (loads/stores) includes 2 memory accesses

A modern single cycle implementation would be stuck at <10Mhz.

Caches will improve common case access time, not worst case.

Tying frequency to worst case path violates first law of performance!!

Make the common case fast (we’ll revisit this often)

Summary

Performance

is one of the most important criteria in judging systems.

Here we’ll focus on

Execution time

.

Our main performance equation explains how performance depends on

several factors related to both hardware and software.

CPU time

X,P

= Instructions executed

P

* CPI

X,P

* Clock cycle time

X

It can be hard to measure these factors in real life, but this is a useful

guide for comparing systems and designs.

A single

-

cycle CPU has two main disadvantages.

The cycle time is limited by the worst case latency.

It isn’t efficiently using its hardware.

Next time, we’ll see how this can be rectified with pipelining.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote