HZ (character Encoding) - Structure and Use

Structure and Use

In the HZ encoding system, the character sequences "~{" and "~}" act as escape sequences; anything between them is interpreted as Chinese encoded in GB2312 (the most significant bits are ignored). Outside the escape sequences, characters are assumed to be ASCII.

An example will help illustrate the relationship between GB2312, EUC-CN, and the HZ code:

Various forms of the GB2312 code (0xD2BB) for the character "一" (one)
Form	Code	With escape sequences	Remarks
Kuten / Qūwèi / 区位 form	5027	—	Zone (ku/qū/区) 50, point (ten/wèi/位) 27
ISO 2022 form	52₁₆ 3B₁₆	0E₁₆ 52₁₆ 3B₁₆ 0F₁₆	50 + 32 = 82 = 52₁₆
EUC-CN form	D2₁₆ BB₁₆	D2₁₆ BB₁₆	52₁₆ ∨ 80₁₆ = D2₁₆
HZ form (standard)	52₁₆ 3B₁₆	7E₁₆ 7B₁₆ 52₁₆ 3B₁₆ 7E₁₆ 7D₁₆	Appears as `~{R;~}` without HZ decoder
HZ form (alternate)	D2₁₆ BB₁₆	7E₁₆ 7B₁₆ D2₁₆ BB₁₆ 7E₁₆ 7D₁₆	EUC form acceptable to at least some decoders

HZ was originally designed to be used purely as a 7-bit code. However, when situations allow, the escape sequences "~{" and "~}" sometimes surround characters represented in EUC-CN; this alternative use allows Chinese to be readable either with the help of HZ decoder software, or with a system that understands EUC-CN.

Additionally, the specification defines that

the sequence "~~" is to be treated as encoding a single ASCII "~"
the character "~" followed by a newline is to be discarded.

However, not all HZ decoders follow these two rules.

Read more about this topic: HZ (character Encoding)

Famous quotes containing the word structure:

“Why does philosophy use concepts and why does faith use symbols if both try to express the same ultimate? The answer, of course, is that the relation to the ultimate is not the same in each case. The philosophical relation is in principle a detached description of the basic structure in which the ultimate manifests itself. The relation of faith is in principle an involved expression of concern about the meaning of the ultimate for the faithful.”
—Paul Tillich (1886–1965)