Memset - Character Encodings

Character Encodings

Each string ends at the first occurrence of the null character of the appropriate kind (char or wchar_t). A null character is a character represented as a zero. Consequently, a byte string can contain non-NUL characters in ASCII or any ASCII extension, but not characters in encodings such as UTF-16 (even though a 16-bit code unit might be nonzero, its high or low byte might be zero). The encodings that can be stored in wide strings are defined by the width of wchar_t. In most implementations, wchar_t is at least 16 bits, and so all 16-bit encodings, such as UCS-2, can be stored. If wchar_t is 32-bits, then 32-bit encodings, such as UTF-32, can be stored.

Variable-width encodings can be used in both byte strings and wide strings. String length and offsets are measured in bytes or wchar_t, not in "characters", which can be confusing to beginning programmers. UTF-8 and Shift JIS are often used in C byte strings, while UTF-16 is often used in C wide strings when wchar_t is 16 bits. Truncating strings with variable length characters using functions like strncpy can produce invalid sequences at the end of the string. This can be unsafe if the truncated parts are interpreted by code that assumes the input is valid.

Support for Unicode literals such as char foo = "φωωβαρ";(UTF-8) or wchar_t foo = L"φωωβαρ"; (UTF-16 or UTF-32) is implementation defined, and may require that the source code be in the same encoding. Some compilers or editors will require entering all non-ASCII characters as \xNN sequences for each byte of UTF-8, and/or \uNNNN for each word of UTF-16.

Read more about this topic:  Memset

Famous quotes containing the word character:

    If there be no nobility of descent in a nation, all the more indispensable is it that there should be nobility of ascent—a character in them that bear rule, so fine and high and pure, that as men come within the circle of its influence, they involuntarily pay homage to that which is the one pre-eminent distinction, the Royalty of Virtue.
    Henry Codman Potter (1835–1908)