IEEE Floating Point - Formats

Formats

An IEEE 754 format is a "set of representations of numerical values and symbols". A format may also include how the set is encoded.

A format comprises:

  • Finite numbers, which may be either base 2 (binary) or base 10 (decimal). Each finite number is described by three integers: s = a sign (zero or one), c = a significand (or 'coefficient'), q = an exponent. The numerical value of a finite number is
    (−1)s × c × bq
    where b is the base (2 or 10). For example, if the sign is 1 (indicating negative), the significand is 12345, the exponent is −3, and the base is 10, then the value of the number is −12.345.
  • Two infinities: +∞ and −∞.
  • Two kinds of NaN: a quiet NaN (qNaN) and a signaling NaN (sNaN). A NaN may carry a payload that is intended for diagnostic information indicating the source of the NaN. The sign of a NaN has no meaning, but it may be predictable in some circumstances.

The possible finite values that can be represented in a format are determined by the base (b), the number of digits in the significand (precision, p), and the exponent parameter emax:

  • c must be an integer in the range zero through bp−1 (e.g., if b=10 and p=7 then c is 0 through 9999999)
  • q must be an integer such that 1−emaxq+p−1 ≤ emax (e.g., if p=7 and emax=96 then q is −101 through 90).

Hence (for the example parameters) the smallest non-zero positive number that can be represented is 1×10−101 and the largest is 9999999×1090 (9.999999×1096), and the full range of numbers is −9.999999×1096 through 9.999999×1096. The numbers −b1−emax and b1−emax (here, −1×10−95 and 1×10−95) are the smallest (in magnitude) normal numbers; non-zero numbers between these smallest numbers are called subnormal numbers.

Zero values are finite values with significand 0. These are signed zeros, the sign bit specifies if a zero is +0 (positive zero) or −0 (negative zero).

Read more about this topic:  IEEE Floating Point