Memset - Character Encodings

Character Encodings

Each string ends at the first occurrence of the null character of the appropriate kind (char or wchar_t). A null character is a character represented as a zero. Consequently, a byte string can contain non-NUL characters in ASCII or any ASCII extension, but not characters in encodings such as UTF-16 (even though a 16-bit code unit might be nonzero, its high or low byte might be zero). The encodings that can be stored in wide strings are defined by the width of wchar_t. In most implementations, wchar_t is at least 16 bits, and so all 16-bit encodings, such as UCS-2, can be stored. If wchar_t is 32-bits, then 32-bit encodings, such as UTF-32, can be stored.

Variable-width encodings can be used in both byte strings and wide strings. String length and offsets are measured in bytes or wchar_t, not in "characters", which can be confusing to beginning programmers. UTF-8 and Shift JIS are often used in C byte strings, while UTF-16 is often used in C wide strings when wchar_t is 16 bits. Truncating strings with variable length characters using functions like strncpy can produce invalid sequences at the end of the string. This can be unsafe if the truncated parts are interpreted by code that assumes the input is valid.

Support for Unicode literals such as char foo = "φωωβαρ";(UTF-8) or wchar_t foo = L"φωωβαρ"; (UTF-16 or UTF-32) is implementation defined, and may require that the source code be in the same encoding. Some compilers or editors will require entering all non-ASCII characters as \xNN sequences for each byte of UTF-8, and/or \uNNNN for each word of UTF-16.

Read more about this topic:  Memset

Famous quotes containing the word character:

    Nothing strengthens the judgment and quickens the conscience like individual responsibility. Nothing adds such dignity to character as the recognition of one’s self-sovereignty; the right to an equal place, everywhere conceded—a place earned by personal merit, not an artificial attainment by inheritance, wealth, family and position.
    Elizabeth Cady Stanton (1815–1902)