Structures Used in Natural Language Processing
- Corpus – body of data, optionally tagged (for example, through part-of-speech tagging), providing real world samples for analysis and comparison.
- Text corpus – large and structured set of texts, nowadays usually electronically stored and processed. They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific subject (or domain).
- Speech corpus – database of speech audio files and text transcriptions. In Speech technology, speech corpora are used, among other things, to create acoustic models (which can then be used with a speech recognition engine). In Linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.
Read more about this topic: Natural Language Processing Toolkits
Famous quotes containing the words structures, natural and/or language:
“It is clear that all verbal structures with meaning are verbal imitations of that elusive psychological and physiological process known as thought, a process stumbling through emotional entanglements, sudden irrational convictions, involuntary gleams of insight, rationalized prejudices, and blocks of panic and inertia, finally to reach a completely incommunicable intuition.”
—Northrop Frye (b. 1912)
“The very natural tendency to use terms derived from traditional grammar like verb, noun, adjective, passive voice, in describing languages outside of Indo-European is fraught with grave possibilities of misunderstanding.”
—Benjamin Lee Whorf (18971934)
“Now stamp the Lords Prayer on a grain of rice,
A Bible-leaved of all the written woods
Strip to this tree: a rocking alphabet,
Genesis in the root, the scarecrow word,
And one lights language in the book of trees.”
—Dylan Thomas (19141953)