Binary-to-text Encoding - Encoding Standards

Encoding Standards

The table below compares the most used forms of binary-to-text encodings.

Encoding Data type Efficiency Programming language implementations Comments
Ascii85 Arbitrary 4/5 awk, C, C#, F#, Java Perl, Python, Python (2)
Base16 (hexadecimal) Arbitrary 1/2 Probably any language around
Base32 Arbitrary 5/8 (8 bits) ANSI C
Base64 Arbitrary ~>75% (8 bits) C, C (2), many others
BinHex Arbitrary 3/4 (BinHex>=2.0) Perl, C, C (2) Forgotten since the mid-1980s
Intel HEX Arbitrary ~<50% C library, C++ Usually used for chip programming/flashing
MIME Arbitrary See Quoted-printable and Base64 See Quoted-printable and Base64 Encoding container for e-mail-like formatting
S-record Arbitrary ~<50% C library, C++ Usually used for chip programming/flashing
Percent encoding Text (URIs), Arbitrary (RFC1738) 1/3 (min); usually ~>40% to 70% C, probably many others
Quoted-printable Text min ~>44%, but usually much closer to 1 if text is mostly ASCII Probably many Preserves line breaks; cuts lines at 76 characters
Uuencoding Arbitrary ~75% (usually 60% overall) Perl, C, probably many others Largely replaced by MIME and yEnc
Xxencoding Arbitrary ~75% (with similar overall to Uuencoding) C
yEnc Arbitrary, mostly non-text ~98% C Includes a CRC checksum

The 95 isprint codes 32 to 126 are known as the ASCII printable characters.

Some older and today uncommon formats include BOO, BTOA, and USR encoding.

Most of these encodings generate text containing only a subset of all ASCII printable characters: for example, the base64 encoding generates text that only contains upper case and lower case letters, (A–Z, a–z), numerals (0–9), and the "+", "/", and "=" symbols.

Some of these encoding (quoted-printable and percent encoding) are based on a set of allowed characters and a single escape character. The allowed characters are left unchanged, while all other characters are converted into a string starting with the escape character. This kind of conversion allows the resulting text to be almost readable, in that letters and digits are part of the allowed characters, and are therefore left as they are in the encoded text. These encodings produce the shortest plain ASCII output for input that is mostly printable ascii.

Some other encodings (base64, uuencoding) are based on mapping all possible sequences of six bits into different printable characters. Since there are more than 26 = 64 printable characters, this is possible. A given sequence of bytes is translated by viewing it as stream of bits, breaking this stream in chunks of six bits and generating the sequence of corresponding characters. The different encodings differ in the mapping between sequences of bits and characters and in how the resulting text is formatted.

Some encodings (the original version of BinHex and the recommended encoding for CipherSaber) use four bits instead of six, mapping all possible sequences of 4 bits onto the 16 standard hexadecimal digits. Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding—expanding each byte in the source independently to two encoded bytes is simpler than base64's expanding 3 source bytes to 4 encoded bytes.

Read more about this topic:  Binary-to-text Encoding

Famous quotes containing the word standards:

    To arrive at a just estimate of a renowned man’s character one must judge it by the standards of his time, not ours.
    Mark Twain [Samuel Langhorne Clemens] (1835–1910)