Binary-to-text Encoding - Encoding Standards

Encoding Standards

The table below compares the most used forms of binary-to-text encodings.

Encoding	Data type	Efficiency	Programming language implementations	Comments
Ascii85	Arbitrary	4/₅	awk, C, C#, F#, Java Perl, Python, Python (2)
Base16 (hexadecimal)	Arbitrary	1/₂	Probably any language around
Base32	Arbitrary	5/₈ (8 bits)	ANSI C
Base64	Arbitrary	~>75% (8 bits)	C, C (2), many others
BinHex	Arbitrary	3/₄ (BinHex>=2.0)	Perl, C, C (2)	Forgotten since the mid-1980s
Intel HEX	Arbitrary	~<50%	C library, C++	Usually used for chip programming/flashing
MIME	Arbitrary	See Quoted-printable and Base64	See Quoted-printable and Base64	Encoding container for e-mail-like formatting
S-record	Arbitrary	~<50%	C library, C++	Usually used for chip programming/flashing
Percent encoding	Text (URIs), Arbitrary (RFC1738)	1/₃ (min); usually ~>40% to 70%	C, probably many others
Quoted-printable	Text	min ~>44%, but usually much closer to 1 if text is mostly ASCII	Probably many	Preserves line breaks; cuts lines at 76 characters
Uuencoding	Arbitrary	~75% (usually 60% overall)	Perl, C, probably many others	Largely replaced by MIME and yEnc
Xxencoding	Arbitrary	~75% (with similar overall to Uuencoding)	C
yEnc	Arbitrary, mostly non-text	~98%	C	Includes a CRC checksum

The 95 isprint codes 32 to 126 are known as the ASCII printable characters.

Some older and today uncommon formats include BOO, BTOA, and USR encoding.

Most of these encodings generate text containing only a subset of all ASCII printable characters: for example, the base64 encoding generates text that only contains upper case and lower case letters, (A–Z, a–z), numerals (0–9), and the "+", "/", and "=" symbols.

Some of these encoding (quoted-printable and percent encoding) are based on a set of allowed characters and a single escape character. The allowed characters are left unchanged, while all other characters are converted into a string starting with the escape character. This kind of conversion allows the resulting text to be almost readable, in that letters and digits are part of the allowed characters, and are therefore left as they are in the encoded text. These encodings produce the shortest plain ASCII output for input that is mostly printable ascii.

Some other encodings (base64, uuencoding) are based on mapping all possible sequences of six bits into different printable characters. Since there are more than 26 = 64 printable characters, this is possible. A given sequence of bytes is translated by viewing it as stream of bits, breaking this stream in chunks of six bits and generating the sequence of corresponding characters. The different encodings differ in the mapping between sequences of bits and characters and in how the resulting text is formatted.

Some encodings (the original version of BinHex and the recommended encoding for CipherSaber) use four bits instead of six, mapping all possible sequences of 4 bits onto the 16 standard hexadecimal digits. Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding—expanding each byte in the source independently to two encoded bytes is simpler than base64's expanding 3 source bytes to 4 encoded bytes.

Read more about this topic: Binary-to-text Encoding

Famous quotes containing the word standards:

“Today so much rebellion is aimless and demoralizing precisely because children have no values to challenge. Teenage rebellion is a testing process in which young people try out various values in order to make them their own. But during those years of trial, error, embarrassment, a child needs family standards to fall back on, reliable habits of thought and feeling that provide security and protection.”
—Neil Kurshan (20th century)