3.6) Why is floating-point addition generally slower than integer addition? Afte
ID: 3671054 • Letter: 3
Question
3.6) Why is floating-point addition generally slower than integer addition? After a floating-point computation, the hardware normalizes the binary-point number 0.00001. Which direction is the mantissa shifted, and by how many bit positions? How does the exponent need to be corrected? 3.7) What decimal floating-point number is represented by the IEEE 754 half-precision value 0xBB00? [Half-precision floats are represented like IEEE 754 single-precision floats, only they are 16 bits wide. The leftmost bit is the sign bit, the exponent is 5 bits wide and has a bias of 15, and the fraction is 10 bits long.] 3.8)Explanation / Answer
Coming to First Question why floating point addition generally slower than integer addition?
First, of course, it depends entirely on the CPU architecture in question.
However, integral and floating-point types are handled very differently, so the following is nearly always the case:
For simple operations, integral types are fast.
For example, integer addition often has only a single cycle's latency, and integer multiplication is typically around 2-4 cycles, IIRC.
Floating point types used to perform much slower. On today's CPUs, however, they have excellent throughput, and a each floating point unit can usually retire an operation per cycle, leading to the same (or similar) throughput as for integer operations.
However, latency is generally worse. Floating-point addition often has a latency around 4 cycles (vs 1 for ints).
For some complex operations, the situation is different, or even reversed. For example, division on FP may have less latency than for integers, simply because the operation is complex to implement in both cases, but it is more commonly useful on FP values, so more effort (and transistors) may be spent optimizing that case.
On some CPUs, doubles may be significantly slower than floats. On some architectures, there is no dedicated hardware for doubles, and so they are handled by passing two float-sized chunks through, giving you a worse throughput and twice the latency.
On others (the x86 FPU, for example), both types are converted to the same internal format 80-bit floating point, in the case of x86), so performance is identical.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.