IETF Language Tag - Syntax of Language Tags

Syntax of Language Tags

Each language tag is composed of one or more "subtags" separated by hyphens (-). Each subtag is made with basic Latin letters or digits only.

With the exception of private-use language tags beginning with an "x-" prefix and of grandfathered language tags (including those starting with an "i-" prefix and those previously registered in the IANA database for language tags), subtags occur in the following order:

  • a single primary language subtag composed of a two letter language code from ISO 639-1 (2002), or a three letter code from ISO 639-2 (1998), ISO 639-3 (2007) or ISO 639-5 (2008);
  • up to three optional extended language subtags composed of three letters each, separated by hyphens; (There is currently no extended language subtag registered in the IANA database without an equivalent and preferred primary language subtag. This component of language tags is preserved for backwards compatibility and to allow for future parts of ISO 639.)
  • an optional script subtag, composed of a four letter script code from ISO 15924 (usually written in title case);
  • an optional region subtag composed of a two letter country code from ISO 3166-1 alpha-2 (usually written in upper case), or a three digit code from UN M.49 for geographical regions;
  • optional variant subtags, separated by hyphens, each composed of five to eight letters, or of four characters starting with a digit; (Variant subtags are registered with IANA and not associated with any external standard. ISO 639-6 (2009) four letter language variant codes do not meet this specification and thus are not eligible for variant subtag registration.)
  • optional extension subtags, separated by hyphens, each composed of a single character, with the exception of the letter x, and a hyphen followed by one or more subtags of two to eight characters each, separated by hyphens; (No extension subtags have yet been registered; they are reserved for future standardization.)
  • an optional private use subtag, composed of the letter x and a hyphen followed by subtags of one to eight characters each, separated by hyphens.

Subtags are not case sensitive, but the specification recommends using the same case as in the Language Subtag Registry, where region subtags are uppercase, script subtags are titlecase and all other subtags are lowercase. This capitalization follows the recommendations of the underlying ISO standards.

Optional subtags are preferred to be omitted when they add no distinguishing information to a language tag. For example, "es" is preferred over "es-Latn", as Spanish is fully expected to be written with Latin script; "arb" is preferred over "arb-Arab", as Modern Standard Arabic is understood to be written with Arabic script.

Region subtags are often deprecated by the registration of specific primary language subtags from ISO 639-3 which are now "preferred values". For example, "ar-DZ" is deprecated with the preferred value "arq" for Algerian Spoken Arabic; "arq-DZ" is also deprecated, as the country code adds no further distinction. Most regional differences in languages are interpreted as differences of dialect, rather than being purely regional.

Not all linguistic regions can be represented with a valid region subtag: the subnational regional dialects of a primary language are currently registered specifically as variant subtags. For example, the "valencia" variant subtag for the Valencian dialect of Catalan is registered in the IANA database with the restricting language tag prefix "ca". The region subtag "ES" is implicit for this dialect spoken in two autonomous regions of Spain.

IETF language tags have been used as locale identifiers in many applications. It is recommended that other means be used for defining, encoding and matching locales as this is out of scope of the IETF BCP 47 standard track.

The use, interpretation and matching of IETF language tags is currently defined in RFC 4647 in combination with RFC 5646 and RFC 5645. The Language Subtag Registry, maintained by IANA, lists all currently valid public subtags. Private use subtags are not registered in the IANA database as they are implementation-dependent and subject to private agreements between third-parties using them. These private agreements are out of scope of the BCP 47 standard track.

Read more about this topic:  IETF Language Tag

Famous quotes containing the words language and/or tags:

    There’s language in her eye, her cheek, her lip,
    Nay, her foot speaks; her wanton spirits look out
    At every joint and motive of her body.
    William Shakespeare (1564–1616)

    Worry and brown desk
    Stain it by infusion. There aren’t enough tags at the end,
    And the grove is blind, blossoming, but we are too porous to hear it.
    John Ashbery (b. 1927)