SXML - SXML Specification

SXML Specification

As noted in the previous section, SXML is the concrete instance of the XML Infoset in the form of S-expressions. Further discussion on SXML in this section is based on the SXML specification.

A simplified SXML grammar in EBNF notation is presented below. An SXML is a single Scheme symbol.

::= ( *TOP* * ) ::= ( ? * ) ::= ( @ * ) ::= ( "value"? ) ::= | "character data" | ::= ( *PI* pi-target "processing instruction content string" )

Since an information item in the XML Infoset is a sum of its properties, a list is a particularly suitable data structure to represent an item. The head of the list, a Scheme identifier, names the item. For many items this is their (expanded) item name. For an information item that denotes an XML element, the corresponding list starts with element name, optionally followed by a collection of attributes. The rest of the element item list is an ordered sequence of element children, character data, processing instructions, and other elements in turn. Every child is unique; items never share their children even if the latter have the identical content.

The following example illustrates an XML element and its SXML form (which satisfies the production in SXML grammar).

67 95 (WEIGHT (@ (unit "pound")) (NET (@ (certified)) "67") (GROSS "95"))

The value of an attribute is normally a string; it may be omitted (in case of HTML) for a boolean attribute, e.g., a "certified" attribute in the above example.

A collection of attributes is considered an information item in its own right, tagged with a special name @. The character "@" may not occur in a well-formed XML name; therefore an cannot be mistaken for a list that represents an element. An XML document renders attributes, processing instructions and other meta-data differently from the element markup. In contrast, SXML represents element content and meta-data uniformly—as tagged lists. RELAX NG—a schema language for XML—also aims to treat attributes as uniformly as possible with elements. This uniform treatment, argues James Clark, is a significant factor in simplifying the language. SXML takes advantage of the fact that every XML name is also a valid Scheme identifier, but not every Scheme identifier is a valid XML name. This observation lets us introduce administrative names such as @, *PI*, *TOP* without worrying about potential name clashes. The observation also makes the relationship between XML and SXML well-defined. An XML document converted to SXML can be reconstructed into an equivalent XML document (in terms of the Infoset). Moreover, due to the implementation freedom given by the Infoset specification, SXML itself is an instance of the Infoset.

The XML Recommendation specifies that processing instructions (PI) are distinct from elements and character data; processing instructions must be passed through to applications. In SXML, PIs are therefore represented by nodes of a dedicated type *PI*. XPath and the DOM Level 2 treat processing instructions in a similar way.

A sample XML document and its SXML representation are both shown below, thus providing an illustrative comparison between nested XML tags and nested SXML lists. Note that the SXML document is a bit more compact than its XML counterpart.

783 (*TOP* (*PI* xml "version='1.0'") (di (@ (contract "728g")) (wt (@ (refnum "345")) (delivery (date (@ (month "6") (day "1") (year "2001"))) (weight "783")) (vehicle (@ (type "lorry") (number "A567TP99")))) (wt (@ (refnum "459")) (vehicle (@ (type "car") (number "25676043"))))))

SXML can also be considered an abstract syntax tree for a parsed XML document. An XML document or a well-formed part of it can automatically be converted into the corresponding SXML form via a functional Scheme XML parsing framework SSAX.

It is worth a note that SXML represents all the information contained in XML documents, including comments, namespaces and external entities. These are omitted in this section for the sake of simplicity, but they are considered in the SXML specification.

Read more about this topic:  SXML