Comparison of Unicode Encodings - For Communication and Storage

For Communication and Storage

UTF-16 and UTF-32 are not byte oriented, so a byte order must be selected when transmitting them over a byte-oriented network or storing them in a byte-oriented file. This may be achieved by standardising on a single byte order, by specifying the endianness as part of external metadata (for example the MIME charset registry has distinct UTF-16BE and UTF-16LE registrations) or by using a byte-order mark at the start of the text. UTF-8 is byte-oriented and does not have this problem.

If the byte stream is subject to corruption then some encodings recover better than others. UTF-8 and UTF-EBCDIC are best in this regard as they can always resynchronize at the start of the next code point, GB 18030 is unable to recover after a corrupt or missing byte until the next ASCII non-number. UTF-16 and UTF-32 will handle corrupt (altered) bytes by resynchronizing on the next good code point, but an odd number of lost or spurious byte (octet)s will garble all following text.

Read more about this topic: Comparison Of Unicode Encodings

Famous quotes containing the word storage:

“Many of our houses, both public and private, with their almost innumerable apartments, their huge halls and their cellars for the storage of wines and other munitions of peace, appear to me extravagantly large for their inhabitants. They are so vast and magnificent that the latter seem to be only vermin which infest them.”
—Henry David Thoreau (1817–1862)