1.3. Write a program, under the conditions below, that will multiply two 2 dimen
ID: 662331 • Letter: 1
Question
1.3. Write a program, under the conditions below, that will multiply two 2 dimensional square matrices, using standard matrix multiplication, of C++ type double in which the dimension of each row and column is 4096 radix decimal .
1.3.1. Write the program to compile and execute strictly on the host in ANSI C++.
1.3.2. Write the program to compile and execute strictly on the device (GPU) under CUDA, except for the actual input and output of the program that shall be on the host.
1.3.3. Write the program to compile and execute on both the device under CUDA and on the host under ANSI C++. In this case, the program must accept input on the host, store the data on the hard drive (in a file) on the host, copy the data into host memory, actually do the calculation on the device, store the output in a file on the host, and output the data on the host.
1.3.4. Did you use pointers of any type in any of your programs above? Why or why not? Justify your approach.
Explanation / Answer
1.3.1
#include<stdio.h>
#include<iostream.h>
#include<conio.h>
int a[10][10];int b[10][10];int c[10][10];
void main()
{
int i,j,k,l;
clrscr();
cout<<"Enter the value of the matrix a:";
for(i=0;i<2;i++)
{
for(j=0;j<2;j++)
{
cin>>a[i][j];
}
}
cout<<" Enter the value of the matrix b:";
for(i=0;i<2;i++)
{
for(j=0;j<2;j++)
{
cin>>b[i][j];
}
}
cout<<" The multiplied value is:";
for(i=0;i<2;i++)
{
for(j=0;j<2;j++)
{
c[i][j]=0;
for(k=0;k<2;k++)
{
c[i][j]=c[i][j]+(a[i][k]*b[k][j]);
}
}
}
for(i=0;i<2;i++)
{
for(j=0;j<2;j++)
{
cout<<c[i][j];
}
cout<<" " ;
}
getch();
}
1.3.2
void MatrixMulOnHost( float* M, float* N, float* P, int Width)
{ for (int i = 0; i < Width; ++i)
for (int j = 0; j < Width; ++j)
{ float sum = 0; for (int k = 0; k < Width; ++k)
{
float a = M[i * Width + k];
float b = N[k * Width + j];
sum += a * b;
}
P[i * Width + j] = sum;
}
}
GPU kernel:
void MatrixMulKernel(float* d_M, float* d_N, float* d_P, int Width)
{ int row = threadIdx.y; int col = threadIdx.x;
float P_val = 0;
for (int k = 0; k < Width; ++k)
{ float M_elem = d_M[row * Width + k];
float N_elem = d_N[k * Width + col];
P_val += M_elem * N_elem;
}
d_p[row*Width+col] = P_val;
}
1.3.3
__global__ void MatrixMulKernel(float* d_M, float* d_N, float* d_P, int Width)
{ int start_row = blockDim.y * blockIdx.y + threadIdx.y * TILE_WIDTH;
int end_row = start_row + TILE_WIDTH;
int start_col = blockDim.x * blockIdx.x + threadIdx.x * TILE_WIDTH;
int end_col = start_col + TILE_WIDTH;
for (int row = start_row; row < end_row; row++)
{ for(int col = start_col; col < end_col; col++)
{ float P_val = 0;
for (int k = 0; k < Width; ++k)
{ float M_elem = d_M[row * Width + k];
float N_elem = d_N[k * Width + col];
P_val += M_elem * N_elem;
}
d_p[row*Width+col] = P_val;
}
}
1.3.4 Yes ,I have used pointers of integer type in the above problem as there are following benefits of using pointers mentioned below:
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.