Big5 - Organization

Organization

The original Big5 character set is sorted first by usage frequency, second by stroke count, lastly by Kangxi radical.

The original Big5 character set lacked many commonly used characters. To solve this problem, each vendor developed its own extension. The ETen extension became part of the current Big5 standard through popularity.

The structure of Big5 does not conform to the ISO 2022 standard, but rather bears a certain similarity to the Shift JIS encoding. It is a double-byte character set (DBCS) with the following structure:

First byte ("lead byte") 0x81 to 0xfe (or 0xa1 to 0xf9 for non-user-defined characters)
Second byte 0x40 to 0x7e, 0xa1 to 0xfe

(the prefix 0x signifying hexadecimal numbers).

Certain variants of the Big5 character set, for example the HKSCS, use an expanded range for the lead byte including values in the 0x81 to 0xA0 range (similar to Shift JIS).

If the second byte is not in the correct range, behaviour is undefined (i.e., varies from system to system).

The numerical value of individual Big5 codes are frequently given as a 4-digit hexadecimal number, which describes the two bytes that comprise the Big5 code as if the two bytes were a big endian representation of a 16-bit number. For example, the Big5 code for a full-width space, which are the bytes 0xa1 0x40, is usually written as 0xa140 or just A140.

Strictly speaking, the Big5 encoding contains only DBCS characters. However, in practice, the Big5 codes are always used together with an unspecified, system-dependent single-byte character set (ASCII, or an 8-bit character set such as code page 437), so that you will find a mix of DBCS characters and single-byte characters in Big5-encoded text. Bytes in the range 0x00 to 0x7f that are not part of a double-byte character are assumed to be single-byte characters. (For a more detailed description of this problem, please see the discussion on "The Matching SBCS" below.)

The meaning of non-ASCII single bytes outside the permitted values that are not part of a double-byte character varies from system to system. In old MSDOS-based systems, they are likely to be displayed as 8-bit characters; in modern systems, they are likely to either give unpredictable results or generate an error.

Read more about this topic:  Big5

Famous quotes containing the word organization:

    One of the many reasons for the bewildering and tragic character of human existence is the fact that social organization is at once necessary and fatal. Men are forever creating such organizations for their own convenience and forever finding themselves the victims of their home-made monsters.
    Aldous Huxley (1894–1963)

    The methods by which a trade union can alone act, are necessarily destructive; its organization is necessarily tyrannical.
    Henry George (1839–1897)

    Politics, as a practise, whatever its professions, has always been the systematic organization of hatreds.
    Henry Brooks Adams (1838–1918)