Please anone help me with this screen shots of test runs please Write a parallel

ID: 3766319 • Letter: P

Question

Please anone help me with this screen shots of test runs please

Write a parallel program with MPI that supports the following computation.

1). It generates five processes P0, P1, P2, P3, and P4.

2). Main process gets a number n from key board, then initiates MPI.

3) Process Pi (i=0,1,2,3) uses n to call the following two functions.

a) function prime(int n) finds the smallest prime number q that q=8m+(2i+1) >n for some integer m. Note a prime number p is an integer that is not product of two integers less than p.

b) function twin(int n) finds the least twin number (q, q+2) that q=8m+(2i+1)> n for some integer m. A pair (q,q+2) is a twin if both q and q+2 are prime numbers.

4) P4 gets all four results from the other four processes and returns the least prime number and twin.

For example, if n=10, P0 returns 11 and (11,13), P1 returns 13 and (17, 19), P2 returns 17 and (17, 19), and P3 returns 19 and (29,31). Finally, P4 returns 11 and (11,13).

All the five processes share the same program.

Your complete C++ code for this project with summary in a word document, please have readable code

Screenshots of test runs

please anyone help me with this program its urgent

Explanation / Answer

We can select an integrand and limits of integration so the integral mathematically is equal to p. This makes checking the correctness of the program straightforward. A simple C program implementing this algorithm follows:

static long num_steps = 100000;

double step;

void main ()

{

int i;

double x, pi, sum = 0.0;

step = 1.0/(double) num_steps;

for (i=0;i<= num_steps; i++){

x = (i+0.5)*step;

sum = sum + 4.0/(1.0+x*x);

}

pi = step * sum;

}

OpenMP

OpenMP [omp] is an industry standard API for writing parallel application programs for shared memory computers. The primary goal of OpenMP is to make the loop oriented programs common in high performance computing easier to write. Constructs were included in OpenMP to support SPMD, Master worker, pipeline and most other types of parallel algorithms as well [Mattson05].

OpenMP has been a very successful parallel language. It is available on every shared memory computer on the market. Recently Intel© has created a variation on OpenMP to support clusters as well. OpenMP supports a style of programming where parallelism is added incrementally so an existing sequential program evolves into a parallel program. This advantage, however, is also OpenMP’s greatest weakness. By using incremental parallelism, a programmer might miss the large scale restructuring of a program often required to get the most performance.

OpenMP is a continuously evolving standard. An industry group called “the Ope nMP Architecture Review Board” meets regularly to develop new extensions to the language. The next release of OpenMP (version 3.0) will include a task queue capability. This will allow OpenMP to handle a wider range of control structures as well as more general recursive algorithms.

OpenMP Overview

OpenMP is based on the fork-join programming model. A running OpenMP program starts as a single thread. When the programmer wishes to exploit concurrency in the program, additional threads are forked to create a team of threads. These threads execute in parallel across a region of code called a parallel region. At the end of the parallel region, the threads wait until all of the threads have finished their work, and then they join back together. At that point, the original or “master” thread continues until the next parallel region is encountered (or the end of the program).

The language constructs in OpenMP are defined in terms of compiler directives that tell the compiler what to do in order to implement the desired parallelism. In C and C++ these directives are defined in terms of pragmas.

The OpenMP pragma have the same form in every case

#pragma omp construct_name one_or_more_clauses

The construct_name defines the parallel action desired by the programmer while the clauses modify that action or control the data environment seen by the threads.

OpenMP is an explicit parallel programming language. If a thread is created or work is mapped onto that thread, the programmer must specify the desired action. Therefore, even a simple API such as OpenMP has a wide range of constructs and clauses the programmer must learn. Fortunately, a great deal can be done with OpenMP using only a small subset of the full language.

To create a thread in OpenMP, you use the “parallel” construct.

#pragma omp parallel

{

…. A block of statements

}

When used by itself without any modifying clauses, the program creates a number of threads chosen by the runtime environment (often equal to the number of processors or cores). Each thread will execute the block of statements following the parallel pragma. This can be almost any set of legal statements in C, the only exception being that you must not branch into our out of the block of statements. This makes sense if you think about it. If the threads are going to all execute the set of statements and if the resulting behavior of the program is to make sense, then you can’t have arbitrary threads branching into or out of the construct within the parallel region. This is a common constraint in OpenMP. We call this block of statement lacking such branches a “structured block”.

You can do a great deal of parallel programming by having each thread execute the same statements. But to experience the full power of OpenMP, we need to do more. We need to share the work of executing the set of statements among the threads. We call this type of behavior “work sharing”. The most common work-sharing construct is the loop construct which in C is the for the for loop

#pragma omp for

This only works for simple loops with the canonical form

for(i=lower_limit; i<upper_limit; inc_exp)

The for construct takes the iterations of the loop and parcels them out among a team of threads created earlier with a parallel construct. The loop limits and the expression to increment the loop index (inc_exp) must all be fully determined at compile time and any constants used in these expressions must be the same among all the threads in the team. This makes sense if you think about it. The system needs to figure out how many iterations of the loop there will be and them map them onto sets that can be handed out to the team of threads. This can only be done in a consistent and well behaved manner if all the threads compute the same index sets.

Notice that the for construct does not create threads. You can only do this with a parallel construct. As a short cut, you can put the parallel and for construct together in one pragma.

#pragma omp parallel for

This creates a team of threads to execute the iterations of an immediately following loop.

The iterations of the loop must be independent so that the result of the loop is the same regardless of the order the iterations are executed or which threads execute which iteration of the loop. If one thread writes a variable and another thread reads that variable, then we have a loop-carried dependence and program will generate incorrect results. The programmer must carefully analyze the body of a loop to make sure there are no loop carried dependencies. In many cases, the loop carried dependency arises from a variable used to hold intermediate results used within a given iteration of the loop. In this case, you can remove the loop carried dependency by declaring that each thread is to have its own value for the variable. This can be done with a private clause. For example, if a loop uses a variable named “tmp” to hold a temporary value, you could add the following clause to an OpenMP construct so it can be used inside the loop body without causing any loop carried dependencies

private(tmp)

Another common situation occurs when a variable appears inside a loop and is used to accumulate values from each iteration. For example, you may have a loop that sums the results of a computation into a single value. This is a common issue in parallel programming. It is called a reduction. In OpenMP, we have a reduction clause

reduction(+:sum)

As with the private clause, this is added to an OpenMP construct to tell the compiler to expect a reduction. A temporary private variable is created, and is used to create a partial result of the accumulation operation for each thread. Than at the end of the construct, the values from each thread are combined to yield the final answer. The operation used in the reduction is also specific in the clause. In this case, the operation is “+”. OpenMP defines the value for the private variable used in the reduction based on the identity for the mathematical operation in question. For examp le, for “+”, this value is zero.

There is much more to OpenMP, but with these two constructs and two clauses, we can explain how to parallelize the p program.

The OpenMP p Program

To keep things simple, we will leave the number of steps to be used fixed. And we will only work with the default number of threads. In the serial p program there is a single loop to parallelize. The iterations of the loop are completely independent except for the value of the dependent variable “x” and the accumulation variable “sum”. Notice that “x” is used as temporary storage for the computation within a loop iteration. Hence we can deal with this variable by making it local to each thread with a private clause

private(x)

Technically, the loop control index creates a loop carried dependence. OpenMP, however, understands that the loop control index needs to be local to each thread so it automatically makes that index private to each thread.

The accumulation variable, “sum”, is used in a summation. This is a classic reduction so we can use the reduction clause:

reduction(+:sum)

Adding these clauses to the “parallel for” construct we have our p program parallelized with OpenMP.

#include "omp.h"

static long num_steps = 100000; double step;

void main ()

{

int i;

double x, pi, sum = 0.0;

step = 1.0/(double) num_steps;

#pragma omp parallel for private(x) reduction(+:sum)

for (i=0;i<= num_steps; i++){

x = (i+0.5)*step;

sum = sum + 4.0/(1.0+x*x);

}

pi = step * sum;

}

Note that we also included the standard include file for OpenMP

#include "omp.h"

int my_id, numprocs;

MPI_Init(&argc, &argv) ;

MPI_Comm_Rank(MPI_COMM_WORLD, &my_id) ;

MPI_Comm_Size(MPI_COMM_WORLD, &numprocs) ;

int MPI_Finalize();

In between these routines, is the work of the MPI program. Most of the program is regular serial code in the language of your choice. As mentioned before, while every process is executing the same code, the behavior of the program is different based on the process rank. At points where communication or some other interaction between processes is required, MPI routines are insterted. The first version of MPI had over 120 routines and the later version (MPI 2.0) was even larger. Most programs, however, use only a tiny subset of MPI functions. We will talk about only one; a routine to carry out a reduction and return the final reduced result to one of the processes in the group.

int MPI_Reduce(void* sendbuf, void* recvbuf,

int count, MPI_Datatype datatype, MPI_OP op,

int root, MPI_COMM comm.)

The MPI p Program

The MPI p program is a straightforward modification of the original serial code. To keep things as simple as possible, we will continue to set the number of steps in the program itself rather than input the value and broadcast it to the other processes.

The program opens with the MPI include file to define datatypes, constants and the routines in MPI. We then include the standard trio of routines to initialize the MPI environment and make the basic parameters (number of processes and rank) available to the program.

#include "mpi.h"

static long num_steps = 100000;

void main (int argc, char *argv[])

{

int i, my_id, numprocs;

double x, pi, step, sum = 0.0 ;

step = 1.0/(double) num_steps ;

MPI_Init(&argc, &argv) ;

MPI_Comm_Rank(MPI_COMM_WORLD, &my_id) ;

MPI_Comm_Size(MPI_COMM_WORLD, &numprocs) ;

my_steps = num_steps/numprocs ;

for (i=my_id; i<num_steps; i+numprocs)

{

x = (i+0.5)*step;

sum += 4.0/(1.0+x*x);

}

sum *= step ;

MPI_Reduce(&sum, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,

MPI_COMM_WORLD) ;

MPI_Finalize(ierr);

}

MPI_Reduce(&sum, &pi, 1, MPI_DOUBLE, MPI_SUM, 0,

MPI_COMM_WORLD) ;

The Java threads pprogram

In this simple example we show how one would write a parallelized version of p program with help of “plain” Java threads:

public class PI1 {

static long num_steps = 100000;

static double step;

static double sum = 0.0;

static int part_step;

static class PITask extends Thread {

int part_number;

double x = 0.0;

double sum = 0.0;

public PITask(int part_number) {

this.part_number = part_number;

}

public void run() {

for (int i = part_number; i < num_steps; i += part_step)

{

x = (i + 0.5) * step;

sum += 4.0 / (1.0 + x * x);

}

public static void main(String[] args) {

;

int i;

double pi;

step = 1.0 / (double) num_steps;

part_step = Runtime.getRuntime().availableProcessors();

PITask[] part_sums = new PITask[part_step];

for (i = 0; i < part_step; i++) {

(part_sums[i] = new PITask(i)).start();

}

for (i = 0; i < part_step; i++) {

try {

part_sums[i].join();

} catch (InterruptedException e) {

}

sum += part_sums[i].sum;

}

pi = step * sum;

System.out.println(pi);

}

import EDU.oswego.cs.dl.util.concurrent.FJTask;

import EDU.oswego.cs.dl.util.concurrent.FJTaskRunnerGroup;

public class PI2 {

static int num_steps = 100000;

static double step;

static double sum = 0.0;

static int part_step;

static class PITask extends FJTask {

int i = 0;

double sum = 0.0;

public PITask(int i) {

this.i = i;

}

public void run() {

double x = (i + 0.5) * step;

sum += 4.0 / (1.0 + x * x);

}

public static void main(String[] args) {

int i;

double pi;

step = 1.0 / (double) num_steps;

try {

FJTaskRunnerGroup g = new FJTaskRunnerGroup(Runtime.getRuntime()

.availableProcessors());

;

PITask[] tasks = new PITask[num_steps];

for (i = 0; i < num_steps; i++) {

tasks[i] = new PITask(i);

}

g.invoke(new FJTask.Par(tasks));

for (i = 0; i < num_steps; i++) {

sum += tasks[i].sum;

}

pi = step * sum;

System.out.println(pi);

;

System.out.println(Math.PI);

} catch (InterruptedException ie) {

}

static long num_steps = 100000;

double step;

void main ()

{

int i;

double x, pi, sum = 0.0;

step = 1.0/(double) num_steps;

for (i=0;i<= num_steps; i++){

x = (i+0.5)*step;

sum = sum + 4.0/(1.0+x*x);

}

pi = step * sum;

}

Navigate

Please anone help me with this screen shots of test runs please Write a parallel

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Please anone help me with this screen shots of test runs please Write a parallel

Question

Explanation / Answer

Related Questions

Navigate