Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider the following GPU that consists of 8 multiprocessors clocked at 1.5 GHz

ID: 3575509 • Letter: C

Question

Consider the following GPU that consists of 8 multiprocessors clocked at 1.5 GHz, each of which contains 8 multithreaded single-precision floating-point units and integer processing units. It has a memory system that consists of 8 partitions of 1GHz Graphics DDR3DRAM, each 8 bytes wide and with 256 MB of capacity. Making reasonable assumptions (state them), and a naive matrix multiplication algorithm, compute how much time the computation C = A * B would take. A, B, and C are n * n matrices and n is determined by the amount of memory the system has.

Explanation / Answer

For the above problem we assume that it has a single-precision FP multiply-add instruction,   
And the formula for
   single precision FP multiply add perofmance is given as
           =#MPs * #SP/MP * #FLOPs/instr/SP * #instr/clock * #clocks/sec
so according to above data,we get
   single precision FP multiply add perofmance =8 * 8 * 2 * 1 * 1.5 G = 192 GFlops /second   

Also the memory size for DDR3 memory can be calculated as = 8 * 256 MB
                                = 2048 MB

The peak DD3 bandwidth can be calculated = #Partitions * #bytes/transfer * #transfers/clock *                         #clocks/sec = 8 * 8 * 2 * 1G
                   = 128 GB/sec

As we know the the Modern computers comes with 32-bit single precision So, if we want 3 n*n SP matrices, then we need to calculate maximum n as below
max(n)= 3n^2 * 4
= 2048 * 1024 * 1024
   =13377
i.e. n=13377

Now,
Using naive mm algorithm the number of operations alculated as follows:   
For each element of the result - n multiply-adds required.
For each row of the result -n * n multiply-adds required .
For the entire result matrix - n * n * n multiply-adds required.
So , approx 2393 GFlops.

We assume no cache and we need to load 2 matrices and store 1 matrice to the graphics memory.
= 3 * n^2
= 512 GB of data.
ANd hence process will take 512 / 128 = 4 seconds   
the processing will take 2393 / 192 = 12.46 seconds   
and finally
entire matrix multiplication will take =4+12.46
                   =16.46 seconds.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote