Consider the following code that adds two matrices A and B and stores the result

ID: 3774581 • Letter: C

Question

Consider the following code that adds two matrices A and B and stores the result in a matrix C:

for (i= 0 to 15) {

for (j= 0 to 63) {

C[i][j] = A[i][j] + B[i][j];

}

If we had a quadcore multiprocessor, where the elements of the matrices A, B, C are stored in row major order, which one of the following two parallelizations is better and why? What about when they are stored in column major order?

(a) For each Pk in {0, 1, 2, 3}:

for (i= 0 to 15) {

for (j= Pk*15 + Pk to (Pk+1)*15 + Pk) {

// Inner Loop Parallelization

C[i][j] = A[i][j] + B[i][j];

}

(b) For each Pk in {0, 1, 2, 3}:

for (i= Pk*3 + Pk to (Pk+1)*3 + Pk) {

// Outer Loop Parallelization

for (j= 0 to 63) {

C[i][j] = A[i][j] + B[i][j];

}

Explanation / Answer

First we will take case of row major matrix,case(a) is always better because it accesses elements row by row,hence thread can take advantage of spatial locality property of cache.Similary,when we access column by column in column major matrix there will be more number of cache miss because elements of column are separeted by number of row size.This happens in memory also and in cache also.Hence for column major matrix case(b) is better

Navigate

Consider the following code snippet: public class Vehicle { private String vehic

Consider the following code that checks to see if adjacent elements in an array

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Consider the following code that adds two matrices A and B and stores the result

Question

Explanation / Answer

Related Questions

Navigate