Consider a memory system with a level 1 cache of 32 KB and DRAM of 512 MB with t
ID: 3530194 • Letter: C
Question
Consider a memory system with a level 1 cache of 32 KB and DRAM of 512 MB with the processor operating at 1 GHz. The latency to L1 cache is one cycle and the latency to DRAM is 100 cycles. In each memory cycle, the processor fetches four words (cache line size is four words). What is the peak achievable performance of a dot product of two vectors? Note: Where necessary, assume an optimal cache placement policy. /* dot product loop */ for (i = 0; i < dim; i++) dot_prod += a[i] * b[i] Now consider the problem of multiplying a dense matrix with a vector using a two-loop dot-product formulation. The matrix is of dimension 4K x 4K. (Each row of the matrix takes 16KB of storage.) What is the peak achievable performance of this technique using a two-loop dot-product based matrix-vector product? /* matrix vector product loop */ for (i=0; i< dim; j++) c[i] += a[i][j] * b[j];Explanation / Answer
Answer :
processor operating at 1 GHz
SO time per cycle = 1/1*10^9
The latency to L1 cache is one cycle and the latency to DRAM is 100 cycles
time for 100 cycle = 100 / (1* 10^9)
he processor fetches four words (cache line size is four words) : 4*4 *100/(1/ (4 * 10^9)) = 40MFLOPS
For 1 dot product formaulation time will be 40MFLOPS
Now consider for multiplying a dense matrix with a vector using a two-loop dot-product formulation. The matrix is of dimension 4K x 4K. (Each row of the matrix takes 16KB of storage.)
So Peak achievable perforamance = 16 *100 /(1/ 16 * 10^9) = 10 MFLOPS
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.