Corpus Linguistics - Methods

Methods

Corpus Linguistics has generated a number of research methods, attempting to trace a path from data to theory. Wallis and Nelson (2001) first introduced what they called the 3A perspective: Annotation, Abstraction and Analysis.

  • Annotation consists of the application of a scheme to texts. Annotations may include structural markup, part-of-speech tagging, parsing, and numerous other representations.
  • Abstraction consists of the translation (mapping) of terms in the scheme to terms in a theoretically motivated model or dataset. Abstraction typically includes linguist-directed search but may include e.g., rule-learning for parsers.
  • Analysis consists of statistically probing, manipulating and generalising from the dataset. Analysis might include statistical evaluations, optimisation of rule-bases or knowledge discovery methods.

Most lexical corpora today are part-of-speech-tagged (POS-tagged). However even corpus linguists who work with 'unannotated plain text' inevitably apply some method to isolate terms that they are interested in from surrounding words. In such situations annotation and abstraction are combined in a lexical search.

The advantage of publishing an annotated corpus is that other users can then perform experiments on the corpus. Linguists with other interests and differing perspectives than the originators' can exploit this work. By sharing data, corpus linguists are able to treat the corpus as a locus of linguistic debate, rather than as an exhaustive fount of knowledge.

Read more about this topic:  Corpus Linguistics

Famous quotes containing the word methods:

    The methods by which a trade union can alone act, are necessarily destructive; its organization is necessarily tyrannical.
    Henry George (1839–1897)

    The comparison between Coleridge and Johnson is obvious in so far as each held sway chiefly by the power of his tongue. The difference between their methods is so marked that it is tempting, but also unnecessary, to judge one to be inferior to the other. Johnson was robust, combative, and concrete; Coleridge was the opposite. The contrast was perhaps in his mind when he said of Johnson: “his bow-wow manner must have had a good deal to do with the effect produced.”
    Virginia Woolf (1882–1941)

    A writer who writes, “I am alone” ... can be considered rather comical. It is comical for a man to recognize his solitude by addressing a reader and by using methods that prevent the individual from being alone. The word alone is just as general as the word bread. To pronounce it is to summon to oneself the presence of everything the word excludes.
    Maurice Blanchot (b. 1907)