Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

consider the following computer: -8 cores ( not hyperthreaded) 1 word is 64 bits

ID: 3681486 • Letter: C

Question

consider the following computer:

-8 cores ( not hyperthreaded) 1 word is 64 bits.

-3 ghz instruction buffer that load instruction load penalities and the computer leaves writes to the cache negating write penalities.

- 32 kb L1 data cache (per core) , 1 cycle access(per word), 8 word blocks, 4 way assosiative, write through, 1@ missratio in L2

- connection bus (L1 to L2) : 2 words wide

- 6mb L2 cache , 5ns access, 32 word blocks, 4 way assosiative, write back, 20% miss ratio , 10% dirty

- connection bus L2 to RAM : 4 words wide

- 16 gb RAM, 55 ns first access to new address, ns per word burst rate ( repeat access from consecutive address)

Assume that FP instruction takes 3 cyclestypical instruction mix is ome FP double precision 3 address instruction every 10. Assume every 6 floating point instructions require 2 read and one write to the memory, and non floating point requires 1 read and 1 write every 10 instructions.

a) Calculate the memory bandwodth required for peak throughput on all 8 cores simulatneously.

b) Assume the memory bus is 4 words wide to the L2 cache( since L2 cache is shared, all 8 cores read and write to it and there is a single bus) . Given this can peak throughput calculated above can be maintained?

c) The L1 to L2 buses (1 per core) are only 2 words wide- Is this a problem? Why or Why not?

d) This CPU is not pipelined, but does have a resevation station using tomasulos algorithm. How many stations are capable of floating point operations do we need in each core? Why?

Explanation / Answer

a) memory bandwidth on all 8 cores during peak = rate of data transfer = number of Mega bytes per second

= 32 * 3 * 8 = 768 mega bytes per second

b) yes, the above peak through put can be maintenied due to the following reasons:

c] The L1 and L2 busses with limited word length of 2 words wide can give raise to problems as they can not transfer wider data

d) we need 2^n processors in each pipelined station to achien e the parallel processing ability