Extended Unix Code - EUC-JP

EUC-JP is a variable-width encoding used to represent the elements of three Japanese character set standards, namely JIS X 0208, JIS X 0212, and JIS X 0201.

  • A character from the lower half of JIS-X-0201 (ASCII, code set 0) is represented by one byte, in the range 0x21 – 0x7E.
  • A character from the upper half of JIS-X-0201 (half-width kana, code set 2) is represented by two bytes, the first being 0x8E, the second in the range 0xA1 – 0xDF.
  • A character from JIS-X-0208 (code set 1) is represented by two bytes, both in the range 0xA1 – 0xFE.
  • A character from JIS-X-0212 (code set 3) is represented by three bytes, the first being 0x8F, the following two in the range 0xA1 – 0xFE.

This encoding scheme allows the easy mixing of 7-bit ASCII and 8-bit Japanese without the need for the escape characters employed by ISO-2022-JP, which is based on the same character set standards.

In Japan, the EUC-JP encoding is heavily used by Unix or Unix-like operating systems (except for HP-UX), while Shift_JIS or its extensions (Windows code page 932 and MacJapanese) are used on other platforms. Therefore, whether Japanese web sites use EUC-JP or Shift_JIS often depends on what OS the author uses.

EUC-JISX0213 is similar to but different from EUC-JP in that two planes of JIS X 0213 take place of JIS-X-0208 and JIS-X-0212. There is a similar relationship between Shift_JIS and Shift-JISX0213.

Read more about this topic:  Extended Unix Code