Pentium III - Pentium III's SSE Implementation

Pentium III's SSE Implementation

Since Katmai was built in the same 0.25 µm process as Pentium II "Deschutes", it had to implement SSE using as little silicon as possible. To achieve this goal, Intel implemented the 128-bit architecture by double-cycling the existing 64-bit data paths and by merging the SIMD-FP multiplier unit with the x87 scalar FPU multiplier into a single unit. To utilize the existing 64-bit data paths, Katmai issues each SIMD-FP instruction as two μops. To compensate partially for implementing only half of SSE’s architectural width, Katmai implements the SIMD-FP adder as a separate unit on the second dispatch port. This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together bringing the peak throughput back to four floating point operations per cycle — at least for code with an even distribution of multiplies and adds.

The issue was that Katmai’s hardware-implementation contradicted the parallelism model implied by the SSE instruction-set. Programmers faced a code-scheduling dilemma: Should the SSE-code be tuned for Katmai's limited execution resources, or should it be tuned for a future processor with more resources? Katmai-specific SSE optimizations yielded the best possible performance from the Pentium III family but was suboptimal for later Intel processors, such as the Pentium 4 and Core.

Read more about this topic:  Pentium III

Famous quotes containing the word iii:

    Napoleon wanted to turn Paris into Rome under the Caesars, only with louder music and more marble. And it was done. His architects gave him the Arc de Triomphe and the Madeleine. His nephew Napoleon III wanted to turn Paris into Rome with Versailles piled on top, and it was done. His architects gave him the Paris Opera, an addition to the Louvre, and miles of new boulevards.
    Tom Wolfe (b. 1931)