Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider the following code that adds two matrices A and B and stores the result

ID: 3774581 • Letter: C

Question

Consider the following code that adds two matrices A and B and stores the result in a matrix C:

for (i= 0 to 15) {

for (j= 0 to 63) {

C[i][j] = A[i][j] + B[i][j];

}

}

If we had a quadcore multiprocessor, where the elements of the matrices A, B, C are stored in row major order, which one of the following two parallelizations is better and why? What about when they are stored in column major order?

(a) For each Pk in {0, 1, 2, 3}:

for (i= 0 to 15) {

for (j= Pk*15 + Pk to (Pk+1)*15 + Pk) {

// Inner Loop Parallelization

C[i][j] = A[i][j] + B[i][j];

}

}

(b) For each Pk in {0, 1, 2, 3}:

for (i= Pk*3 + Pk to (Pk+1)*3 + Pk) {

// Outer Loop Parallelization

for (j= 0 to 63) {

C[i][j] = A[i][j] + B[i][j];

}

}

Explanation / Answer

First we will take case of row major matrix,case(a) is always better because it accesses elements row by row,hence thread can take advantage of spatial locality property of cache.Similary,when we access column by column in column major matrix there will be more number of cache miss because elements of column are separeted by number of row size.This happens in memory also and in cache also.Hence for column major matrix case(b) is better

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote