Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider a GPU with the following characteristics: Clock rate: 1.6 GHz Number of

ID: 3686968 • Letter: C

Question

Consider a GPU with the following characteristics:
Clock rate: 1.6 GHz
Number of SIMD processors (i.e. Streaming Multiprocessors (SM) in NVIDIA CUDA
terminology): 16
Number of Floating Point Units per SIMD processor: 16
GPU off-chip memory bandwidth: 100 GB/s
Compute the throughput in FLoating-point Operations Per Second (FLOPS) without considering
the memory bandwidth and assuming all memory latencies can be hidden. Assuming that each
FP operation requires two operands of 4 Bytes each and outputs one 4 Byte result, is this
throughput sustainable with current memory bandwidth?

Explanation / Answer

Given clock rate = 1.6 GHz
SIMD core processors = 16
Floating point units are 16
Memory bandwidth 100 GB/s

So throughput in FLOPS ==> clock rate * number of core processors
1.6 Ghz x 16 ===> 25.6 GHz

each operation takes 4 bytes ... then total memory required is 25.6 x 4 = 102.4 GB/s
Avaiable memory bandwidth is 100 GB/s

So 102.4 > 100 =====> not sufficient

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote