HTML Document Characters
Web pages are typically HTML or XHTML documents. Both types of documents consist, at a fundamental level, of characters, which are graphemes and grapheme-like units, independent of how they manifest in computer storage systems and networks.
An HTML document is a sequence of Unicode characters. More specifically, HTML 4.0 documents are required to consist of characters in the HTML document character set: a character repertoire wherein each character is assigned a unique, non-negative integer code point. This set is defined in the HTML 4.0 DTD, which also establishes the syntax (allowable sequences of characters) that can produce a valid HTML document. The HTML document character set for HTML 4.0 consists of most, but not all, of the characters jointly defined by Unicode and ISO/IEC 10646: the Universal Character Set (UCS).
Like HTML documents, an XHTML document is a sequence of Unicode characters. However, an XHTML document is an XML document, which, while not having an explicit "document character" layer of abstraction, nevertheless relies upon a similar definition of permissible characters that cover most, but not all, of the Unicode/UCS character definitions. The sets used by HTML and XHTML/XML are slightly different, but these differences have little effect on the average document author.
Regardless of whether the document is HTML or XHTML, when stored on a file system or transmitted over a network, the document's characters are encoded as a sequence of bit octets (bytes) according to a particular character encoding. This encoding may either be a Unicode Transformation Format, like UTF-8, that can directly encode any Unicode character, or a legacy encoding, like Windows-1252, that cannot. However, even when using encodings that do not support all Unicode characters, the encoded document may make use of numeric character references. For example ☺
(☺) is used to indicate a smiling face character in the Unicode character set.
Read more about this topic: Unicode And HTML
Famous quotes containing the words document and/or characters:
“... research is never completed ... Around the corner lurks another possibility of interview, another book to read, a courthouse to explore, a document to verify.”
—Catherine Drinker Bowen (18971973)
“His leanings were strictly lyrical, descriptions of nature and emotions came to him with surprising facility, but on the other hand he had a lot of trouble with routine items, such as, for instance, the opening and closing of doors, or shaking hands when there were numerous characters in a room, and one person or two persons saluted many people.”
—Vladimir Nabokov (18991977)