need answer for 1a,1b and 1c Performance of application running on multiprocesso
ID: 3574260 • Letter: N
Question
need answer for 1a,1b and 1c
Performance of application running on multiprocessors is typically measured by scalability. There are two basic ways to measure the parallel performance of a given application referred as strong and weak scaling. What are the definition of strong and weak scaling? What are the advantages and disadvantages of fine-grained multithreading, coarse-grained multithreading, and simultaneous multithreading? Given the following instruction sequence of three threads, how many clock cycles will fine-grained multithreading, coarse-grained multithreading, simultaneously multithreading (SMT) use respectively? Assume threads are run on a multiprocessor with four issue slots. Assume also that coarse-grained multithreading only switches when there is a stall longer than 1. Each X represents occupied issue slot and each row represents a clock cycle. Consider the following three CPU organizations: CPU SS: A 2-core superscalar microprocessor that provides out-of-order issueExplanation / Answer
1a .STRONG Scaling
For this situation, the issue size remains settled yet the quantity of processing elements are expanded. This is utilized as a defense for projects that set aside a long opportunity to run (something that is CPU-bound). The objective for this situation is to locate a "sweet spot" that permits the calculation to finish in a sensible amt of time,size yet does not waste excessively numerous cycles because of parallel overhead. In strong scaling, a program is considered to scale straightly /linear if the speedup (as far as work units finished per unit time) is equivalent to the quantity of processing elements utilized ( N ). When all is said in done, it is harder to accomplish great solid scaling at bigger process tallies since the communication overhead for some/most calculations increments in the extent to the quantity of process utilized.
Figuring Strong Scaling Efficiency
In the event that the amt of time to finish a work unit with 1 processing elements is t1, and the amt of time to finish a similar unit of work with N processing elements is tN, the strong scaling effectiveness (as a rate of direct) is given as:
t1/( N * tN ) * 100%
Weak Scaling
For this situation, the issue measure i.e size (workload) relegated to every processing element remains consistent and extra element are utilized to take care of a bigger aggregate issue (one that wouldn't fit in RAM on a solitary hub, for instance). Hence, this sort of estimation is justification for projects that take a considerable measure of memory or another system resource (something that is memory-bound). On account of weak scaling, linear scaling is accomplished if the run time remains consistent while the workload is expanded in the direct extent to the quantity of processors. Most projects running in this mode ought to scale well to bigger center considers they normally utilize the closest neighbor communication patterns where the communication overhead is generally consistent paying little respect to the quantity of procedures utilized; special cases incorporate calculations that utilize substantial utilization of worldwide communication patterns , eg. FFTs and transposes.
Ascertaining Weak Scaling Efficiency
On the off chance that the amt of time to finish a work unit with 1 processing element is t1, and the amt of time to finish N of similar work units with N processing element is tN, the weak scaling efficiency (as a rate of straight / linear) is given as:
( t1/tN ) * 100%
--------------------------------------------------------------------------------------------------------------------------------------------
1b.
Fine-Grained Multithreading
• Switches between strings on every direction, bringing about the
execution of different strings to be interleaved
• Usually done in a round-robin design, avoiding any slowed down strings
• CPU must have the capacity to switch strings each clock
• Advantage is it can cover up both short and long slows down, since
directions from different strings executed when one string slows down
• Disadvantage is it backs off execution of individual strings, since
a string prepared to execute without slows down will be postponed by
directions from different strings
• Used on Sun's Niagara
- -
Coarse-Grained Multithreading
• Switches strings just on expensive slows down, for example, L2 reserve misses
• Advantages
• Relieves need quick string exchanging
• Doesn't back off string, since directions from different strings issued just when the
string experiences an expensive slow down
• Disadvantage is difficult to defeat throughput misfortunes from shorter slows down, due to
pipeline start-up expenses
• Since CPU issues directions from 1 string, when a slowdown happens, the pipeline must be
purged or solidified
• New string must fill pipeline before directions can finish
• Because of this start-up overhead, coarse-grained multithreading is better for lessening punishment of high cost slows down, where pipeline refill << slow down time
• Used in IBM AS/400 P4
- -
Simultaneous Multithreading
Simultaneous multithreading is a processor outline that consolidates hardware multithreading with superscalar processor innovation to permit numerous threads to issue directions to every cycle. Not at all like other hardware multithreaded models, (for example, the Tera MTA), in which just a solitary hardware setting (i.e., string) is dynamic in any given cycle, SMT allows all string settings to at the same time vie for and share processor assets. Not at all like customary superscalar processors, which experience the ill effects of an absence of per-string guideline level parallelism, concurrent multithreading utilizes different strings to adjust for low single-string ILP. The execution outcome is fundamentally higher direction throughput and program speedups on an assortment of workloads that incorporate business databases, web servers and logical applications in both multiprogrammed and parallel situations.
Disadvantage
There are greater register pressure & greater per thread latency because of the longer pipeline.
• On a multi-programmed workload, there are the more stress on shared structures for instance BPB, cache, TLB etc.
The Parallel Workload moves to stress the functional units more.
There is additionally a security worry with certain simultaneous multithreading usage. Intel's hyperthreading usage has a weakness through which it is workable for one application to take a cryptographic key from another application running in a similar processor by observing its cache us.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.