Natural Language Processing Toolkits - Structures Used in Natural Language Processing

Structures Used in Natural Language Processing

  • Corpus – body of data, optionally tagged (for example, through part-of-speech tagging), providing real world samples for analysis and comparison.
    • Text corpus – large and structured set of texts, nowadays usually electronically stored and processed. They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific subject (or domain).
    • Speech corpus – database of speech audio files and text transcriptions. In Speech technology, speech corpora are used, among other things, to create acoustic models (which can then be used with a speech recognition engine). In Linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.

Read more about this topic:  Natural Language Processing Toolkits

Famous quotes containing the words structures, natural and/or language:

    It is clear that all verbal structures with meaning are verbal imitations of that elusive psychological and physiological process known as thought, a process stumbling through emotional entanglements, sudden irrational convictions, involuntary gleams of insight, rationalized prejudices, and blocks of panic and inertia, finally to reach a completely incommunicable intuition.
    Northrop Frye (b. 1912)

    The very natural tendency to use terms derived from traditional grammar like verb, noun, adjective, passive voice, in describing languages outside of Indo-European is fraught with grave possibilities of misunderstanding.
    Benjamin Lee Whorf (1897–1934)

    Now stamp the Lord’s Prayer on a grain of rice,
    A Bible-leaved of all the written woods
    Strip to this tree: a rocking alphabet,
    Genesis in the root, the scarecrow word,
    And one light’s language in the book of trees.
    Dylan Thomas (1914–1953)