Attached is python code for game of life. Q. Create the plot of speed vs. number
ID: 3827088 • Letter: A
Question
Attached is python code for game of life.
Q. Create the plot of speed vs. number of processors when we incorporate parallel processing into the code. Please attach your code to create this plot.
import numpy as np, matplotlib.pyplot as plt
import time
from numpy.random import random as rnd
from multiprocessing import Pool
from matplotlib import animation
import copy
arraySize = 100
Z = np.array([[randint(0, 1) for x in range(arraySize)] for y in range(arraySize)])
def computeNeighbours(Z):
rows, cols = len(Z), len(Z[0])
N = np.zeros(np.shape(Z))
for x in range(rows):
for y in range(cols):
Q = [q for q in [x-1, x, x+1] if ((q >= 0) and (q < cols))]
R = [r for r in [y-1, y, y+1] if ((r >= 0) and (r < rows))]
S = [Z[q][r] for q in Q for r in R if (q, r) != (x, y)]
N[x][y] = sum(S)
return N
def iterate(Z):
Zprime = Z.copy()
rows, cols = len(Zprime), len(Zprime[0])
N = computeNeighbours(Zprime)
for x in range(rows):
for y in range(cols):
if Zprime[x][y] == 1:
if (N[x][y] < 2) or (N[x][y] > 3):
Zprime[x][y] = 0
else:
if (N[x][y] == 3):
Zprime[x][y] = 1
return Zprime
fig = plt.figure()
Zs = [Z]
ims = []
for i in range(0, 100):
im = plt.imshow(Zs[len(Zs)-1], interpolation = 'nearest', cmap='binary')
ims.append([im])
Zs.append(iterate(Zs[len(Zs)-1]))
ani = animation.ArtistAnimation(fig, ims, interval=250, blit=True)
plt.show()
Explanation / Answer
strong scaling you're assuming a fixed workload size so if you increase the number of threads, the size of the data that each thread needs to work on decreases. On modern CPUs memory accesses are expensive and would be preferable to maintain locality by keeping the data in caches. Therefore, the likely optimal number of threads can be found when the dataset of each thread fits in each core's cache.This holds true even when the number of threads exceeds the number of cores.run with four threads where each thread needs to complete 2AU. Each thread takes 10s to complete.With four cores the total amount of time will be 10s. run with eight threads where each thread needs to complete 1AU. Each thread takes only 2s.speaking from computation and memory bound point of view.4000 threads will make application run really slow. Part of the problem is a very high overhead of context switching and most likely very poor memory locality.But it also depends on your architecture. From where I heard Niagara processors are suppose to be able to handle multiple threads on a single core using some kind of advanced pipelining technique. However I have no experience with those processors.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.