Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

From text book : Patterson and Hennessy: Computer Organization and Design The Ha

ID: 3576963 • Letter: F

Question

From text book : Patterson and Hennessy: Computer Organization and Design The Hardware/software
Interface, 5 th Ed.

6.6 Matrix multiplication plays an important role in a number of applications.
Two matrices can only be multiplied if the number of columns of the fi rst matrix is
equal to the number of rows in the second.
Let’s assume we have an m × n matrix A and we want to multiply it by an n × p
matrix B. We can express their product as an m × p matrix denoted by AB (or A B).
If we assign C = AB, and ci,j denotes the entry in C at position (i, j), then for each
element i and j with 1 i m and 1 j p. Now we want to see if we can parallelize
the computation of C. Assume that matrices are laid out in memory sequentially as
follows: a1,1, a2,1, a3,1, a4,1, …, etc.
6.6.1 [10] <§6.5> Assume that we are going to compute C on both a single core
shared memory machine and a 4-core shared-memory machine. Compute the
speedup we would expect to obtain on the 4-core machine, ignoring any memory
issues.
6.6.2 [10] <§6.5> Repeat Exercise 6.6.1, assuming that updates to C incur a cache
miss due to false sharing when consecutive elements are in a row (i.e., index i) are
updated.
6.6.3 [10] <§6.5> How would you fi x the false sharing issue that can occur?

Explanation / Answer

6.6.1)We would easily get this done

We would get a speed up of 4 times as there are 4 cores available to us

6.6.2)we would get same time as every item is a cache miss

6.6.3)we can fix it using tiling algorithm i.e dividing the array into smaller parts such that they fix exatly into the cache

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote