Syntax (programming Languages) - Syntax Definition

Syntax Definition

The syntax of textual programming languages is usually defined using a combination of regular expressions (for lexical structure) and Backus–Naur Form (for grammatical structure) to inductively specify syntactic categories (nonterminals) and terminal symbols. Syntactic categories are defined by rules called productions, which specify the values that belong to a particular syntactic category. Terminal symbols are the concrete characters or strings of characters (for example keywords such as define, if, let, or void) from which syntactically valid programs are constructed.

Below is a simple grammar, defined using the notation of regular expressions and Backus–Naur Form. It is based on Lisp, which defines productions for the syntactic categories expression, atom, number, symbol, and list:

expression ::= atom | list
atom ::= number | symbol
number ::= ?+
symbol ::= .*
list ::= '(' expression* ')'

This grammar specifies the following:

  • an expression is either an atom or a list;
  • an atom is either a number or a symbol;
  • a number is an unbroken sequence of one or more decimal digits, optionally preceded by a plus or minus sign;
  • a symbol is a letter followed by zero or more of any characters (excluding whitespace); and
  • a list is a matched pair of parentheses, with zero or more expressions inside it.

Here the decimal digits, upper- and lower-case characters, and parentheses are terminal symbols.

The following are examples of well-formed token sequences in this grammar: '12345', '', '(a b c232 (1))'

The grammar needed to specify a programming language can be classified by its position in the Chomsky hierarchy. The syntax of most programming languages can be specified using a Type-2 grammar, i.e., they are context-free grammars. However, there are exceptions. In some languages like Perl and Lisp the specification (or implementation) of the language allows constructs that execute during the parsing phase. Furthermore, these languages have constructs that allow the programmer to alter the behavior of the parser. This combination effectively blurs the distinction between parsing and execution, and makes syntax analysis an undecidable problem in these languages, meaning that the parsing phase may not finish. For example, in Perl it is possible to execute code during parsing using a BEGIN statement, and Perl function prototypes may alter the syntactic interpretation, and possibly even the syntactic validity of the remaining code. Similarly, Lisp macros introduced by the defmacro syntax also execute during parsing, meaning that a Lisp compiler must have an entire Lisp run-time system present. In contrast C macros are merely string replacements, and do not require code execution.

Read more about this topic:  Syntax (programming Languages)

Famous quotes containing the word definition:

    The definition of good prose is proper words in their proper places; of good verse, the most proper words in their proper places. The propriety is in either case relative. The words in prose ought to express the intended meaning, and no more; if they attract attention to themselves, it is, in general, a fault.
    Samuel Taylor Coleridge (1772–1834)