Gbk - Encoding

Encoding

A character is encoded as 1 or 2 bytes. A byte in the range 007F is a single byte that means the same thing as it does in ASCII. Strictly speaking, there are 96 characters and 32 control codes in this range.

A byte with the high bit set indicates that it is the first of 2 bytes. Loosely speaking, the first byte is in the range 81FE (that is, never 80 or FF), and the second byte is 407E for some areas and 80FE for others.

More specifically, the following ranges of bytes are defined:

GBK Encoding Ranges
range byte 1 byte 2 code points characters
GB 18030 GBK 1.0 Codepage 936 GB 2312
Level GBK/1 A1A9 A1FE 846 728 717 702 682
Level GBK/2 B0F7 A1FE 6,768 6,763 6,763 6,763
Level GBK/3 81A0 40FE except 7F 6,080 6,080 6,080
Level GBK/4 AAFE 40A0 except 7F 8,160 8,160 8,080
Level GBK/5 A8A9 40A0 except 7F 192 166 166
user-defined AAAF A1FE 564
user-defined F8FE A1FE 658
user-defined A1A7 40A0 except 7F 672
total: 23,940 21,897 21,886 21,791 7,445

In graphical form, the following figure shows the space of all 64K possible 2-byte codes. Green and yellow areas are assigned GBK codepoints, red are for user-defined characters. The uncolored areas are invalid byte combinations.

Read more about this topic:  Gbk