Quadruple-precision Floating-point Format - IEEE 754 Quadruple-precision Binary Floating-point Format: Binary128

IEEE 754 Quadruple-precision Binary Floating-point Format: Binary128

The IEEE 754 standard specifies a binary128 as having:

  • Sign bit: 1
  • Exponent width: 15
  • Significand precision: 113 (112 explicitly stored)

This gives from 33 - 36 significant decimal digits precision (if a decimal string with at most 33 significant decimal is converted to IEEE 754 quadruple precision and then converted back to the same number of significant decimal, then the final string should match the original; and if an IEEE 754 quadruple precision is converted to a decimal string with at least 36 significant decimal and then converted back to quadruple, then the final number must match the original ).

The format is written with an implicit lead bit with value 1 unless the exponent is stored with all zeros. Thus only 112 bits of the significand appear in the memory format, but the total precision is 113 bits (approximately 34 decimal digits, ). The bits are laid out as follows:

A binary256 would have 237 bits (approximately 71 decimal digits) and exponent bias 262143.

Read more about this topic:  Quadruple-precision Floating-point Format