Extended ASCII - Multi Byte Character Sets

Multi Byte Character Sets

There are multi byte character sets (character sets that can handle more than 256 different characters) that are also true extended ASCII. That means all bytes 0x00-0x7F have the same meaning as in ASCII. UTF-8 is such a character set.

And ISO/IEC 6937 is near of such, with one different character.

They can be used in file formats where only ASCII bytes are used for keywords and file format syntax, while bytes 0x80-0xFF might be used for free text, including most programming languages. This makes it much easier to introduce a multi-byte character set into existing systems that use extended ASCII.

Other character sets such as Shift JIS and UTF-16 are not true extended ASCII, since ASCII bytes (0x00-0x7F) can appear as part of other characters. Sometimes Shift JIS is called extended ASCII since ASCII characters are stored as ASCII bytes, but other characters can include ASCII bytes also. Shift JIS can directly be used in programming languages and languages such as HTML, since the bytes used for free text delimiters are not used as part of non-ASCII characters. UTF-16 is even less extended ASCII since ASCII characters are stored as two bytes with the other one equal to 0x00. Porting an existing system to support character sets as Shift JIS or UTF-16 is complicated and bug prone.

Read more about this topic:  Extended ASCII

Famous quotes containing the words character and/or sets:

    The true worth of a race must be measured by the character of its womanhood.
    Mary Mcleod Bethune (1875–1955)

    Music sets up ladders,
    it makes us invisible,
    it sets us apart,
    it lets us escape;
    but from the visible
    there is no escape.
    Hilda Doolittle (1886–1961)