Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider a memory system with a level 1 cache of 32 KB and DRAM of 512 MB with t

ID: 3530194 • Letter: C

Question

Consider a memory system with a level 1 cache of 32 KB and DRAM of 512 MB with the processor operating at 1 GHz. The latency to L1 cache is one cycle and the latency to DRAM is 100 cycles. In each memory cycle, the processor fetches four words (cache line size is four words). What is the peak achievable performance of a dot product of two vectors? Note: Where necessary, assume an optimal cache placement policy. /* dot product loop */ for (i = 0; i < dim; i++) dot_prod += a[i] * b[i] Now consider the problem of multiplying a dense matrix with a vector using a two-loop dot-product formulation. The matrix is of dimension 4K x 4K. (Each row of the matrix takes 16KB of storage.) What is the peak achievable performance of this technique using a two-loop dot-product based matrix-vector product? /* matrix vector product loop */ for (i=0; i< dim; j++) c[i] += a[i][j] * b[j];

Explanation / Answer


Answer :

processor operating at 1 GHz

SO time per cycle = 1/1*10^9

The latency to L1 cache is one cycle and the latency to DRAM is 100 cycles

time for 100 cycle = 100 / (1* 10^9)

he processor fetches four words (cache line size is four words) : 4*4 *100/(1/ (4 * 10^9)) = 40MFLOPS

For 1 dot product formaulation time will be 40MFLOPS


Now consider for multiplying a dense matrix with a vector using a two-loop dot-product formulation. The matrix is of dimension 4K x 4K. (Each row of the matrix takes 16KB of storage.)

So Peak achievable perforamance = 16 *100 /(1/ 16 * 10^9) = 10 MFLOPS


Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote