Binary Scaling

Binary scaling is a computer programming technique used mainly by embedded C, DSP and assembler programmers to perform a pseudo floating point using integer arithmetic. It is both faster and more accurate than directly using floating point instructions, however care must be taken not to cause an arithmetic overflow.

A position for the virtual 'binary point' is taken, and then subsequent arithmetic operations determine the resultants 'binary point'.

Binary points obey the mathematical laws of exponentiation.

To give an example, a common way to use integer arithmetic to simulate floating point is to multiply the coefficients by 65536.

This will place the binary point at B16.

For instance to represent 1.2 and 5.6 floating point real numbers as B16 one multiplies them by 216 giving

78643 and 367001

Multiplying these together gives

28862059643

To convert it back to B16, divide it by 216.

This gives 440400B16, which when converted back to a floating point number (by dividing again by 216, but holding the result as floating point) gives 6.71999. The correct floating point result is 6.72.

The scaling range here is for any number between 65535.9999 and -65536.0 with 16 bits to hold fractional quantities (of course assuming the use of a 64 bit result register). Note that some computer architectures may restrict arithmetic to 32 bit results. In this case extreme care must be taken not to overflow the 32 bit register. For other number ranges the binary scale can be adjusted for optimum accuracy.