Character Sets, Code Pages, and Character Maps
In computer science, the terms character encoding, character map, character set or code page were historically synonymous, as the same standard would specify a repertoire of characters and how they were to be encoded into a stream of code units – usually with a single character per code unit. The terms now have related but distinct meanings, reflecting the efforts of standards bodies to use precise terminology when writing about and unifying many different encoding systems. Regardless, the terms are still used interchangeably, with character set being nearly ubiquitous.
A code page usually means a byte oriented encoding, but with regard to some suite of encodings (covering different scripts), where many characters share the same codes in most or all those code pages. Well known code page suites are "Windows" (based on Windows-1252) and "IBM"/"DOS" (based on code page 437), see Windows code page for details. Most, but not all, encodings referred to as code pages are single-byte encodings (but see octet on byte size.)
IBM's Character Data Representation Architecture (CDRA) designates with coded character set identifiers (CCSIDs) and each of which is variously called a charset, character set, code page, or CHARMAP.
The term code page does not occur in Unix or Linux where charmap is preferred, usually in the larger context of locales.
Contrasted to CCS above, a character encoding is a map from abstract characters to code words. A character set in HTTP (and MIME) parlance is the same as a character encoding (but not the same as CCS).
Legacy encoding is a term sometimes used to characterize old character encodings, but with an ambiguity of sense. Most of its use is in the context of Unicodification, where it refers to encodings that fail to cover all Unicode code points, or, more generally, using a somewhat different character repertoire: several code points representing one Unicode character, or versa (see e.g. code page 437). Some sources refer to an encoding as legacy only because it preceded Unicode. All Windows code pages are usually referred to as legacy, both because they antedate Unicode and because they are unable to represent all 221 possible Unicode code points.
Read more about this topic: Character Encoding
Famous quotes containing the words character, code and/or maps:
“An actor rides in a bus or railroad train; he sees a movement and applies it to a new role. A woman in agony of spirit might turn her head just so; a man in deep humiliation probably would wring his hands in such a way. From straws like these, drawn from completely different sources, the fabric of a character may be built. The whole garment in which the actor hides himself is made of small externals of observation fitted to his conception of a role.”
—Eleanor Robson Belmont (18781979)
“Wise Draco comes, deep in the midnight roll
Of black artillery; he comes, though late;
In code corroborating Calvins creed
And cynic tyrannies of honest kings;
He comes, nor parlies; and the Town, redeemed,
Gives thanks devout; nor, being thankful, heeds
The grimy slur on the Republics faith implied,
Which holds that Man is naturally good,
Andmoreis Natures Roman, never to be
scourged.”
—Herman Melville (18191891)
“And now good morrow to our waking souls,
Which watch not one another out of fear;
For love all love of other sights controls,
And makes one little room an everywhere.
Let sea-discoverers to new worlds have gone,
Let maps to other, worlds on worlds have shown,
Let us possess one world; each hath one, and is one.”
—John Donne (15721631)