TeX - Typesetting System

Typesetting System

TeX commands commonly start with a backslash and are grouped with curly braces. However, almost all of TeX's syntactic properties can be changed on the fly which makes TeX input hard to parse by anything but TeX itself. TeX is a macro and token-based language: many commands, including most user-defined ones, are expanded on the fly until only unexpandable tokens remain which get executed. Expansion itself is practically side-effect free. Tail recursion of macros takes no memory, and if-then-else constructs are available. This makes TeX a Turing-complete language even at the expansion level.

The system can be divided into four levels: in the first, characters are read from the input file and assigned a category code (sometimes called "catcode", for short). Combinations of a backslash (really: any character of category zero) followed by letters (characters of category 11) or a single other character are replaced by a control sequence token. In this sense this stage is like lexical analysis, although it does not form numbers from digits. In the next stage, expandable control sequences (such as conditionals or defined macros) are replaced by their replacement text. The input for the third stage is then a stream of characters (including ones with special meaning) and unexpandable control sequences (typically assignments and visual commands). Here characters get assembled into a paragraph. TeX's paragraph breaking algorithm works by optimizing breakpoints over the whole paragraph. The fourth stage breaks the vertical list of lines and other material into pages.

The TeX system has precise knowledge of the sizes of all characters and symbols, and using this information, it computes the optimal arrangement of letters per line and lines per page. It then produces a DVI file ("DeVice Independent") containing the final locations of all characters. This dvi file can be printed directly given an appropriate printer driver, or it can be converted to other formats. Nowadays, PDFTeX is often used which bypasses DVI generation altogether.

The base TeX system understands about 300 commands, called primitives. However, these low-level commands are rarely used directly by users, and most functionality is provided by format files (predumped memory images of TeX after large macro collections have been loaded). Knuth's original default format, which adds about 600 commands, is Plain TeX (available from CTAN). The most widely used format is LaTeX, originally developed by Leslie Lamport, which incorporates document styles for books, letters, slides, etc., and adds support for referencing and automatic numbering of sections and equations. Another widely used format, AMS-TeX, is produced by the American Mathematical Society, and provides many more user-friendly commands, which can be altered by journals to fit with their house style. Most of the features of AMS-TeX can be used in LaTeX by using the AMS "packages". This is then referred to as AMS-LaTeX. Other formats include ConTeXt, used primarily for desktop publishing and written mostly by Hans Hagen at Pragma.

Read more about this topic:  TeX

Famous quotes containing the word system:

    Delight at having understood a very abstract and obscure system leads most people to believe in the truth of what it demonstrates.
    —G.C. (Georg Christoph)