Hapax Legomenon - Computer Science

Computer Science

In the fields of computational linguistics and natural language processing (NLP), esp. corpus linguistics and machine-learned NLP, it is common to disregard hapax legomena (and sometimes other infrequent words), as they are likely to have little value for computational techniques. This disregard has the added benefit of significantly reducing the memory use of an application, since, by Zipf's law, many words are hapaxes.

Read more about this topic:  Hapax Legomenon

Famous quotes containing the words computer and/or science:

    What, then, is the basic difference between today’s computer and an intelligent being? It is that the computer can be made to see but not to perceive. What matters here is not that the computer is without consciousness but that thus far it is incapable of the spontaneous grasp of pattern—a capacity essential to perception and intelligence.
    Rudolf Arnheim (b. 1904)

    The knowledge of an unlearned man is living and luxuriant like a forest, but covered with mosses and lichens and for the most part inaccessible and going to waste; the knowledge of the man of science is like timber collected in yards for public works, which still supports a green sprout here and there, but even this is liable to dry rot.
    Henry David Thoreau (1817–1862)