Quadruple-precision Floating-point Format

Quadruple-precision Floating-point Format

In computing, quadruple precision (also commonly shortened to quad precision) is a binary floating-point computer number format that occupies 16 bytes (128 bits) in computer memory.

This 128 bit quadruple precision is designed not only for applications requiring results in higher than double precision, but also, as a primary function, to allow the computation of double precision results more reliably and accurately by minimising overflow and round-off errors in intermediate calculations and scratch variables: as William Kahan, primary architect of the original IEEE-754 floating point standard noted, "For now the 10-byte Extended format is a tolerable compromise between the value of extra-precise arithmetic and the price of implementing it to run fast; very soon two more bytes of precision will become tolerable, and ultimately a 16-byte format... That kind of gradual evolution towards wider precision was already in view when IEEE Standard 754 for Floating-Point Arithmetic was framed."

In IEEE 754-2008 the 128-bit base-2 format is officially referred to as binary128.

Floating-point precisions
IEEE 754
  • 16-bit: Half (binary16)
  • 32-bit: Single (binary32), decimal32
  • 64-bit: Double (binary64), decimal64
  • 128-bit: Quadruple (binary128), decimal128
  • Extended precision formats
Other
  • Minifloat
  • Arbitrary precision

Read more about Quadruple-precision Floating-point Format:  IEEE 754 Quadruple-precision Binary Floating-point Format: Binary128, Double-double Arithmetic, Implementations