The latencies of individual stages in five-stage MIPS (Microprocessor without In
ID: 3576646 • Letter: T
Question
The latencies of individual stages in five-stage MIPS (Microprocessor without Interlocked Pipeline Stages) Architecture are given below.
Instruction
Instruction Fetch
Register Read
Arithmetic Logic Unit (ALU)
Memory Access
Register Write
Latency
200ps
100ps
200ps
300ps
100ps
(10 pts) What is the clock cycle time in a pipelined and non-pipelined processor?
Pipelined version : ______________
Non-pipelined version : ______________
The classic five-stage pipeline MIPS architecture is used to execute the code fragments. Assume the followings:
Register write is done in the first half of the clock cycle; register read is performed in the second half of the clock cycle,
Branches are resolved in the fourth stage of the pipeline and the architecture does not utilize any branch prediction mechanism
Forwarding is not supported.
(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.
Clock Cycle à
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
add R1, R2, R3
add R4, R5, R6
beq R1, R4, target
I4
(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.
Clock Cycle à
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
add R4, R5, R6
lw R1, 0(R2)
beq R1, R4, target
I4
(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.
Clock Cycle à
1
2
3
4
5
6
7
8
9
10
11
12
13
add R1, R2, R3
add R1, R1, R4
add R1, R1, R5
(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.
Clock Cycle à
1
2
3
4
5
6
7
8
9
10
11
12
13
lw R1, 4(R2)
sw R1, 0(R2)
The classic five-stage pipeline MIPS architecture is used to execute the code fragments. Assume the followings:
Register write is done in the first half of the clock cycle; register read is performed in the second half of the clock cycle,
Branches are resolved in the second stage of the pipeline and the architecture does not utilize any branch prediction mechanism
Forwarding is fully supported.
(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.
Clock Cycle à
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
add R1, R2, R3
add R4, R5, R6
beq R1, R4, target
I4
(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.
Clock Cycle à
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
add R4, R5, R6
lw R1, 0(R2)
beq R1, R4, target
I4
(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.
Clock Cycle à
1
2
3
4
5
6
7
8
9
10
11
12
13
add R1, R2, R3
add R1, R1, R4
add R1, R1, R5
(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.
Clock Cycle à
1
2
3
4
5
6
7
8
9
10
11
12
13
lw R1, 4(R2)
sw R1, 0(R2)
a) (18 pts) A 64 KB L1 cache has a 32 byte block size and is 8-way set-associative.
How many sets does the cache have?
How many bits are used for the offset, index, and tag, assuming that the CPU provides 32-bit addresses?
How large is the tag array including valid bit?
b) (16 pts) Consider a program that can execute with no stalls and a CPI of 1 if the underlying processor can service every load instruction with a 2-cycle L1 cache hit. In practice, 10% of all load instructions suffer from an L1 cache miss. Every cache miss results in a 300-cycle stall while data is fetched from memory. What is the CPI for this program if 20% of the program's instructions are load instructions?
c) (16 pts) Consider an L1 cache that has 16 sets, is direct-mapped (1-way), and supports a block size of 16 bytes. For the following memory access pattern (shown as byte addresses), show which accesses are hits and misses. For each case, indicate the set number.
0, 8, 16, 24, 32, 40, 48, 256, 28, 8, 36, 12, 20, 260.
Instruction
Instruction Fetch
Register Read
Arithmetic Logic Unit (ALU)
Memory Access
Register Write
Latency
200ps
100ps
200ps
300ps
100ps
Explanation / Answer
Pipelined: cycle time determined by slowest stage: 300ps.
Non-pipelined: cycle time determined by sum of all stages: 900ps.
Please post different questions for each question
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.