Consider a memory system with a level 1 cache of 32 KB and DRAM of 512 MB with t

ID: 3530194 • Letter: C

Question

Consider a memory system with a level 1 cache of 32 KB and DRAM of 512 MB with the processor operating at 1 GHz. The latency to L1 cache is one cycle and the latency to DRAM is 100 cycles. In each memory cycle, the processor fetches four words (cache line size is four words). What is the peak achievable performance of a dot product of two vectors? Note: Where necessary, assume an optimal cache placement policy. /* dot product loop */ for (i = 0; i < dim; i++) dot_prod += a[i] * b[i] Now consider the problem of multiplying a dense matrix with a vector using a two-loop dot-product formulation. The matrix is of dimension 4K x 4K. (Each row of the matrix takes 16KB of storage.) What is the peak achievable performance of this technique using a two-loop dot-product based matrix-vector product? /* matrix vector product loop */ for (i=0; i< dim; j++) c[i] += a[i][j] * b[j];

Explanation / Answer

Answer :

processor operating at 1 GHz

SO time per cycle = 1/1*10^9

The latency to L1 cache is one cycle and the latency to DRAM is 100 cycles

time for 100 cycle = 100 / (1* 10^9)

he processor fetches four words (cache line size is four words) : 4*4 *100/(1/ (4 * 10^9)) = 40MFLOPS

For 1 dot product formaulation time will be 40MFLOPS

Now consider for multiplying a dense matrix with a vector using a two-loop dot-product formulation. The matrix is of dimension 4K x 4K. (Each row of the matrix takes 16KB of storage.)

So Peak achievable perforamance = 16 *100 /(1/ 16 * 10^9) = 10 MFLOPS

Navigate

Consider a memory system with a 16 bit address and the following address decode

Consider a memory system with a two-level hierarchy with a cache M1 and main mem

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Consider a memory system with a level 1 cache of 32 KB and DRAM of 512 MB with t

Question

Explanation / Answer

Related Questions

Navigate