1.7. Amdahl’s law Amdahl’s law can be applied in contexts other than parallel pr
ID: 667418 • Letter: 1
Question
1.7. Amdahl’s law
Amdahl’s law can be applied in contexts other than parallel processing. Suppose that a
Numerical application consists of 20% floating-point and 80% integer/control operations (these are based on operation counts rather than their execution times). The execution time of a floating-point operation is three times as long as other operations. We are considering a redesign of the floating-point unit in a microprocessor to make it faster.
a. Formulate a more general version of Amdahl’s law in terms of selective speed-up of a portion of a computation rather than in terms of parallel processing.
b. How much faster should the new floating-point unit be for 25% overall speed improvement?
c. What is the maximum speed-up that we can hope to achieve by only modifying the floating-point unit?
Explanation / Answer
>The main reason for introducing parallelism is to reduce the elapsed time used for executing your code. One way to check if a code is well parallelized is to see if a code has a good speedup.
speedup(N) = elapsed time using 1-CPU / elapsed time using N-CPUs
>Two possibilities exist for obtaining the elapsed time for 1-CPU execution:
>Speedup measured as T(1opt)/T(N) highlights any algorithmic efficiencies that had to be sacrificed to achieve the parallel version. In addition, none of the parallel implementation penalties are hidden by this comparison and thus the speedup is not exaggerated.
>Speedup measured as T(1)/T(N) shows how well the problem is coping with an increasing number of CPUs, and thus provides an indication as to the scalability of the parallel implementation. Without considering any possible parallel overhead, the limit to the speedup (measured with T(1)/T(N)) one can achieve is the fraction of the program that can be run in parallel. This fraction is typically less than 100%. The Amdahl's law says that the maximum theoretical speedup one can achieve is :
**speedup(N,f) = T(1)/T(N) = (ts+tp)/(ts+tp/N) = 1/(1-f+f/N)
ts = time spent on sequential region
tp = time spent on parallel region
f (fraction = %) = tp /(ts + tp)
>The main reason for introducing parallelism is to reduce the elapsed time used for executing your code. One way to check if a code is well parallelized is to see if a code has a good speedup.
speedup(N) = elapsed time using 1-CPU / elapsed time using N-CPUs
>Two possibilities exist for obtaining the elapsed time for 1-CPU execution:
Speedup measured as T(1opt)/T(N) highlights any algorithmic efficiencies that had to be sacrificed to achieve the parallel version. In addition, none of the parallel implementation penalties are hidden by this comparison and thus the speedup is not exaggerated.
Speedup measured as T(1)/T(N) shows how well the problem is coping with an increasing number of CPUs, and thus provides an indication as to the scalability of the parallel implementation. Without considering any possible parallel overhead, the limit to the speedup (measured with T(1)/T(N)) one can achieve is the fraction of the program that can be run in parallel. This fraction is typically less than 100%. The Amdahl's law says that the maximum theoretical speedup one can achieve is :
speedup(N,f) = T(1)/T(N) = (ts+tp)/(ts+tp/N) = 1/(1-f+f/N)
ts = time spent on sequential region
tp = time spent on parallel region
f (fraction = %) = tp /(ts + tp)
>In data parallelism, one focuses first on dividing the data into small pieces and then partition the computation (functions) to be performed by associating each operation with the data. This partitioning yields a number of tasks, each comprising some data and a set of operations on that data. If an operation requires data from several tasks, communication is required to move data between tasks.
>Data parallelism is commonly applied at the loop levels of a code. Thus, some code segments in a loop run concurrently on different data elements and data elements are assigned to various processors and are subjected to identical processing. When needed, communication takes place among processors to exchange data. With data parallelism, the amount of communication is usually high fine grain parallelism
>In a data parallel environment (loop-level parallelism), data can be decomposed implicitly or explicitly:
explicit (manual) decomposition is done by the programmer with no compiler help. Ex: MPI, PVM, shmem
implicit (compiler directive based) decomposition is carried out by the compiler (MP directives find parallelism in implicitly decomposed data)
In functional parallelism, one focuses first on dividing the computations (functions) to be performed into disjoint tasks. One then associates data with each task. When data used in different tasks overlap, communication will take place.
Functional parallelism is usually applied to coarser levels of a code, such as subroutines. Thus, multiple code segments are run concurrently. It works well if these code segments are independent of each other and little communication is needed (coarse grain parallelism). If the code segments need significant overlap of data, considerable communication is required to avoid replication of data. This is often a sign that a data parallelism approach should be considered instead.
A good example of the application of functional parallelism is in the simulation of earth climate which may comprise components such as atmosphere, ocean, etc. The simulations of these components are run concurrently. Exchange of data between components take place during computation.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.