Unicode Control Characters - Language Tags

Language Tags

Unicode includes 128 characters for language tags. These characters essentially mirror the 128 ASCII characters but are used to identify the subsequent text as belonging to a particular language according to BCP 47. For example, to indicate subsequent text as the variant of English as written in the United States, the initiating ‘Language Tag character’ (U+E0001) followed by the sequence ‘Tag Small Letter e’ (U+E0065), ‘Tag Small Letter n’ (U+E006E), ‘Tag Hyphen-minus’ (U+E002D), ‘Tag Small Letter u’ (U+E0075) and ‘Tag Small Letter s’ (U+E0073) would be used.

These language tag characters would not be displayed themselves. However, they would provide information for text processing or even for the display of other characters. For example the display of Unihan ideographs might substitute different glyphs if the language tags indicated Korean than if the tags indicated Japanese. Another example, might influence the display of decimal digits 0 through 9 differently depending on the language they appeared in.

The tag characters have become deprecated in Unicode 5.1 (2008).

Read more about this topic:  Unicode Control Characters

Famous quotes containing the words language and/or tags:

    The great pines stand at a considerable distance from each other. Each tree grows alone, murmurs alone, thinks alone. They do not intrude upon each other. The Navajos are not much in the habit of giving or of asking help. Their language is not a communicative one, and they never attempt an interchange of personality in speech. Over their forests there is the same inexorable reserve. Each tree has its exalted power to bear.
    Willa Cather (1873–1947)

    Worry and brown desk
    Stain it by infusion. There aren’t enough tags at the end,
    And the grove is blind, blossoming, but we are too porous to hear it.
    John Ashbery (b. 1927)