Dither - Digital Audio

Digital Audio

In audio, dither can be useful to break up periodic limit cycles, which are a common problem in digital filters. Random noise is typically less objectionable than the harmonic tones produced by limit cycles.

In 1987, Lipshitz and Vanderkooy pointed out that different noise types, with different probability density functions, behave differently when used as dither signals, and suggested optimal levels of dither signals for audio.

In an analog system, the signal is continuous, but in a PCM digital system, the amplitude of the signal out of the digital system is limited to one of a set of fixed values or numbers. This process is called quantization. Each coded value is a discrete step... if a signal is quantized without using dither, there will be quantization distortion related to the original input signal... In order to prevent this, the signal is "dithered", a process that mathematically removes the harmonics or other highly undesirable distortions entirely, and that replaces it with a constant, fixed noise level.

The final version of audio that goes onto a compact disc contains only 16 bits per sample, but throughout the production process a greater number of bits are typically used to represent the sample. In the end, the digital data must be reduced to 16 bits for pressing onto a CD and distributing.

There are multiple ways to do this. One can, for example, simply discard the excess bits — called truncation. One can also round the excess bits to the nearest value. Each of these methods, however, results in predictable and determinable errors in the result. Take, for example, a waveform that consists of the following values:

1 2 3 4 5 6 7 8

If we reduce our waveform by, say, 20% then we end up with the following values:

0.8 1.6 2.4 3.2 4.0 4.8 5.6 6.4

If we truncate these values we end up with the following data:

0 1 2 3 4 4 5 6

If we instead round these values we end up with the following data:

1 2 2 3 4 5 6 6

For any original waveform, the process of reducing the waveform amplitude by 20% results in regular errors. Take for example a sine wave that, for some portion, matches the values above. Every time the sine wave's value hit "3.2," the truncated result would be off by 0.2, as in the sample data above. Every time the sine wave's value hit "4.0," there would be no error since the truncated result would be off by 0.0, also shown above. The magnitude of this error changes regularly and repeatedly throughout the sine wave's cycle. It is precisely this error which manifests itself as distortion. What the ear hears as distortion is the additional content at discrete frequencies created by the regular and repeated quantization error.

A plausible solution would be to take the 2 digit number (say, 4.8) and round it one direction or the other. For example, we could round it to 5 one time and then 4 the next time. This would make the long-term average 4.5 instead of 4, so that over the long-term the value is closer to its actual value. This, on the other hand, still results in determinable (though more complicated) error. Every other time the value 4.8 comes up the result is an error of 0.2, and the other times it is –0.8. This still results in a repeating, quantifiable error.

Another plausible solution would be to take 4.8 and round it so that the first four times out of five it rounded up to 5, and the fifth time it rounded to 4. This would average out to exactly 4.8 over the long term. Unfortunately, however, it still results in repeatable and determinable errors, and those errors still manifest themselves as distortion to the ear (though oversampling can reduce this).

This leads to the dither solution. Rather than predictably rounding up or down in a repeating pattern, what if we rounded up or down in a random pattern? If we came up with a way to randomly toggle our results between 4 and 5 so that 80% of the time it ended up on 5 then we would average 4.8 over the long run but would have random, non-repeating error in the result. This is done through dither.

We calculate a series of random numbers between 0.0 and 0.9 (ex: 0.6, 0.1, 0.3, 0.6, 0.9, etc.) and we add these random numbers to the results of our equation. Two times out of ten the result will truncate back to 4 (if 0.0 or 0.1 are added to 4.8) and the rest of the times it will truncate to 5, but each given situation has a random 20% chance of rounding to 4 or 80% chance of rounding to 5. Over the long haul this will result in results that average to 4.8 and a quantization error that is random — or noise. This "noise" result is less offensive to the ear than the determinable distortion that would result otherwise.

Audio samples: