Hyphen - in Computing

In Computing

In the ASCII character encoding, the hyphen is encoded as character 45. This character is actually called the hyphen-minus, and it is also used as the minus sign and for dashes. In Unicode, the hyphen-minus is encoded as U+002D (-) so that Unicode remains compatible with ASCII. However, Unicode also encodes the hyphen and minus separately, as U+2010 (‐) and U+2212 (−) respectively, along with the em dash U+2014 (—), en dash U+2013 (–) and other related characters. The hyphen-minus is a general-purpose character which attempts to fulfill several roles, and wherever accurate typography is needed, the correct hyphen, minus, or other symbol should be used instead. For example, compare 4+3−2=5 (minus) and 4+3-2=5 (hyphen-minus); in most fonts the hyphen-minus will have neither the correct width, thickness nor vertical position.

However, the Unicode hyphen is awkward to enter on most keyboards, so the hyphen-minus character remains very common. They are often used instead of dashes or minus signs in situations where the proper characters are unavailable (such as ASCII-only text) or difficult to enter, or when the writer is unaware of the distinction. Some writers use two hyphen-minuses (--) to represent a dash in ASCII text.

Since it is difficult for a computer program to automatically make good decisions on when to hyphenate a word at a line break, the concept of a soft hyphen was introduced to allow manual specification of a place where a hyphenated break was allowed without forcing a line break in an inconvenient place if the text was later re-flowed. In contrast, a hyphen that is always displayed and printed is called a hard hyphen (though some use this term to refer to a non-breaking hyphen; see below). Soft hyphens are inserted into the text at the positions where hyphenation may occur. It is a tedious task to insert the soft hyphens by hand, and tools using hyphenation algorithms are available that do this automatically. The upcoming Cascading Style Sheets (CSS) version 3 will provide language-specific hyphenation dictionaries.

Most text systems consider a hyphen to be a word boundary and a valid point at which to break a line when flowing text. However, this is not always desirable behavior, especially when it could lead to ambiguity (such as in the examples given before, where recreation and re‑creation would be indistinguishable), or in languages other than English (e.g. a line break at the hyphen in Irish an t‑athair or Romanian s‑a would be undesirable). For this purpose, Unicode also encodes a non-breaking hyphen as U+2011 (‑, coded for by ‑). This character looks identical to the regular hyphen, but it is treated as a letter by word processors, namely that the hyphenated word will not be divided at the hyphen should this fall at what would be the end of a line of text; instead, the whole hyphenated word either will remain in full at the end of the line or will go in full to the beginning of the next line.

The ASCII hyphen-minus character is also often used when specifying command-line options. The character is usually followed by one or more letters that indicate specific actions. Typically it is called a dash or switch in this context. Various implementations of the getopt function to parse command-line options additionally allow the use of two hyphen-minus characters ( -- ) to specify long option names that are more descriptive than their single-letter equivalents. Another use of hyphens is that employed by programs written with pipelining in mind— a single hyphen may be recognized in lieu of a filename, with the hyphen then serving as an indicator that a standard stream, instead of a file, is to be worked with.