Size Issues
UTF-32/UCS-4 requires four bytes to encode any character. Since characters outside the basic multilingual plane (BMP) are typically rare, a document encoded in UTF-32 will often be nearly twice as large as its UTF-16/UCS-2–encoded equivalent because UTF-16 uses two bytes for the characters inside the BMP, or four bytes otherwise.
UTF-8 uses between one and four bytes to encode a character. It requires one byte for ASCII characters, making it half the space of UTF-16 for texts consisting only of ASCII. For other Latin characters and many non-Latin scripts it requires two bytes, the same as UTF-16. Only a few frequently used Western characters in the range U+0800 to U+FFFF, such as the € sign U+20AC, require three bytes in UTF-8. Characters outside of the BMP above U+FFFF need four bytes in UTF-8 and UTF-16.
The conservation of bytes in encoding files to a Unicode transformation format (UTF) depends on encoded code points, namely, blocks from which those code points are drawn. Say, it depends on the scripts in use. For example, UTF-16 use less space than UTF-32 only for characters from BMP, which are though overwhelmingly most common of all Unicode. In the same way using characters predominantly from the UTF-8 scripts makes UTF-8 more space efficient than UTF-16. The UTF-8 scripts are those scripts where UTF-8 only requires fewer than three bytes per character (only one byte for the ASCII-equivalent Basic Latin block, digits and most punctuation marks) and include: Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac, Thaana, N'Ko, and the IPA and other Latin-based phonetic alphabets.
All printable characters in UTF-EBCDIC use at least as many bytes as in UTF-8, and most use more, due to a decision made to allow encoding the C1 control codes as single bytes.
For seven-bit environments, UTF-7 is more space efficient than the combination of other Unicode encodings with quoted-printable or base64 for almost all types of text (see "Seven-bit environments" below).
Read more about this topic: Comparison Of Unicode Encodings
Famous quotes containing the words size and/or issues:
“One writes of scars healed, a loose parallel to the pathology of the skin, but there is no such thing in the life of an individual. There are open wounds, shrunk sometimes to the size of a pin-prick but wounds still. The marks of suffering are more comparable to the loss of a finger, or the sight of an eye. We may not miss them, either, for one minute in a year, but if we should there is nothing to be done about it.”
—F. Scott Fitzgerald (18961940)
“I can never bring you to realize the importance of sleeves, the suggestiveness of thumb-nails, or the great issues that may hang from a boot-lace.”
—Sir Arthur Conan Doyle (18591930)