Structures Used in Natural Language Processing
- Corpus – body of data, optionally tagged (for example, through part-of-speech tagging), providing real world samples for analysis and comparison.
- Text corpus – large and structured set of texts, nowadays usually electronically stored and processed. They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific subject (or domain).
- Speech corpus – database of speech audio files and text transcriptions. In Speech technology, speech corpora are used, among other things, to create acoustic models (which can then be used with a speech recognition engine). In Linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.
Read more about this topic: List Of Natural Language Processing Toolkits
Famous quotes containing the words structures, natural and/or language:
“The American who has been confined, in his own country, to the sight of buildings designed after foreign models, is surprised on entering York Minster or St. Peters at Rome, by the feeling that these structures are imitations also,faint copies of an invisible archetype.”
—Ralph Waldo Emerson (18031882)
“the San Marco Library,
Whence turbulent Italy should draw
Delight in Art whose end is peace,
In logic and in natural law
By sucking at the dugs of Greece.”
—William Butler Yeats (18651939)
“Perspective, as its inventor remarked, is a beautiful thing. What horrors of damp huts, where human beings languish, may not become picturesque through aerial distance! What hymning of cancerous vices may we not languish over as sublimest art in the safe remoteness of a strange language and artificial phrase! Yet we keep a repugnance to rheumatism and other painful effects when presented in our personal experience.”
—George Eliot [Mary Ann (or Marian)