Encoding
A character is encoded as 1 or 2 bytes. A byte in the range 00
–7F
is a single byte that means the same thing as it does in ASCII. Strictly speaking, there are 96 characters and 32 control codes in this range.
A byte with the high bit set indicates that it is the first of 2 bytes. Loosely speaking, the first byte is in the range 81
–FE
(that is, never 80
or FF
), and the second byte is 40
–7E
for some areas and 80
–FE
for others.
More specifically, the following ranges of bytes are defined:
range | byte 1 | byte 2 | code points | characters | |||
---|---|---|---|---|---|---|---|
GB 18030 | GBK 1.0 | Codepage 936 | GB 2312 | ||||
Level GBK/1 | A1 –A9 |
A1 –FE |
846 | 728 | 717 | 702 | 682 |
Level GBK/2 | B0 –F7 |
A1 –FE |
6,768 | 6,763 | 6,763 | 6,763 | |
Level GBK/3 | 81 –A0 |
40 –FE except 7F |
6,080 | 6,080 | 6,080 | ||
Level GBK/4 | AA –FE |
40 –A0 except 7F |
8,160 | 8,160 | 8,080 | ||
Level GBK/5 | A8 –A9 |
40 –A0 except 7F |
192 | 166 | 166 | ||
user-defined | AA –AF |
A1 –FE |
564 | ||||
user-defined | F8 –FE |
A1 –FE |
658 | ||||
user-defined | A1 –A7 |
40 –A0 except 7F |
672 | ||||
total: | 23,940 | 21,897 | 21,886 | 21,791 | 7,445 |
In graphical form, the following figure shows the space of all 64K possible 2-byte codes. Green and yellow areas are assigned GBK codepoints, red are for user-defined characters. The uncolored areas are invalid byte combinations.
Read more about this topic: Gbk