Structures Used in Natural Language Processing
- Corpus – body of data, optionally tagged (for example, through part-of-speech tagging), providing real world samples for analysis and comparison.
- Text corpus – large and structured set of texts, nowadays usually electronically stored and processed. They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific subject (or domain).
- Speech corpus – database of speech audio files and text transcriptions. In Speech technology, speech corpora are used, among other things, to create acoustic models (which can then be used with a speech recognition engine). In Linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.
Read more about this topic: List Of Natural Language Processing Toolkits
Famous quotes containing the words structures, natural and/or language:
“It is clear that all verbal structures with meaning are verbal imitations of that elusive psychological and physiological process known as thought, a process stumbling through emotional entanglements, sudden irrational convictions, involuntary gleams of insight, rationalized prejudices, and blocks of panic and inertia, finally to reach a completely incommunicable intuition.”
—Northrop Frye (b. 1912)
“What is most interesting and valuable in it, however, is not the materials for the history of Pontiac, or Braddock, or the Northwest, which it furnishes; not the annals of the country, but the natural facts, or perennials, which are ever without date. When out of history the truth shall be extracted, it will have shed its dates like withered leaves.”
—Henry David Thoreau (18171862)
“We might hypothetically possess ourselves of every technological resource on the North American continent, but as long as our language is inadequate, our vision remains formless, our thinking and feeling are still running in the old cycles, our process may be revolutionary but not transformative.”
—Adrienne Rich (b. 1929)