Corpus-assisted Discourse Studies - Comparison With Standard Corpus Linguistics

Comparison With Standard Corpus Linguistics

Traditional corpus linguistics has, quite naturally, tended to privilege the quantitative approach. In the drive to produce more authentic dictionaries and grammars of a language, it has been characterised by the compilation of some very large corpora of heterogeneric discourse types in the desire to obtain an overview of the greatest quantity and variety of discourse types possible, in other words, of the chimerical but useful fiction called the “general language” (“general English”, “general Italian”, and so on). This has led to the construction of immensely valuable research tools such as the Bank of English and the British National Corpus.

Corpus linguistics proper has also frequently been characterised by the treatment of the corpus as a “black box”, that is, the analyst is not encouraged to familiarise him/herself with particular texts within the corpus in case the special features these texts may possess should distort his or her conceptions of the corpus as a whole. There is a certain argument which runs that, if we are to construct from scratch a fresh descriptive model of the language which is as closely based on the observation of authentic discourse in action as possible, we need, grammatically speaking, a mental tabula rasa to free ourselves of the baleful prejudice exerted by traditional models and allow the data to speak entirely for itself.

The aim of CADS on the other hand is radically different. Here the aim of the exercise is to acquaint oneself as much as possible with the discourse type(s) in hand. Unusually for corpus linguistics, CADS researchers typically engage with their corpus in a variety of ways. As well as via wordlists and concordancing, intuitions for further research can also arise from reading or watching or listening to parts of the data-set, a process which can help provide a feel for how things are done linguistically in the discourse-type being studied.

CADS is also typically characterised by the compilation of ad hoc specialised corpora, since very frequently there exists no previously available collection of the discourse type in question. Just as typically, other corpora of various descriptions are utilized in the course of a study for purposes of comparison. These may include pre-existing corpora or may themselves need to be compiled by the researcher. In some sense, all work with corpora – just as all work with discourse - is properly comparative. Even when a single corpus is employed, it is used to test the data it contains against another body of data. This may consist of the researcher’s intuitions, or the data found in reference works such as dictionaries and grammars, or it may be statements made by previous authors in the field. Corpus-assisted studies of discourse types are, of course, by definition comparative: it is only possible to both uncover and evaluate the particular features of a discourse type by comparing it with others.

Occasionally it is possible to compare the behaviour of the linguistic items under study in a single discourse type (or monogeneric) corpus with their behaviour in one of the large heterogeneric corpora which are commercially available, such as the British National Corpus or the Bank of English mentioned earlier. On other occasions, however, it becomes appropriate to adopt more complex procedures and to edit, tailor or compile a corpus for special purposes.

'A basic, standard methodology in CADS may resemble the following:'

Step 1: Decide upon the research question;

Step 2: Choose, compile or edit an appropriate corpus;

Step 3: Choose, compile or edit an appropriate reference corpus / corpora;

Step 4: Make frequency lists and run a keywords comparison of the corpora;

Step 5: Determine the existence of sets of key items;

Step 6: Concordance interesting key items (with differing quantities of co-text);

Step 7: (Possibly) refine the research question and return to Step 2.

This basic procedure can of course vary according to individual research circumstances and requirements.

Read more about this topic:  Corpus-assisted Discourse Studies

Famous quotes containing the words comparison with, comparison, standard and/or corpus:

    He was a superior man. He did not value his bodily life in comparison with ideal things. He did not recognize unjust human laws, but resisted them as he was bid. For once we are lifted out of the trivialness and dust of politics into the region of truth and manhood.
    Henry David Thoreau (1817–1862)

    The difference between human vision and the image perceived by the faceted eye of an insect may be compared with the difference between a half-tone block made with the very finest screen and the corresponding picture as represented by the very coarse screening used in common newspaper pictorial reproduction. The same comparison holds good between the way Gogol saw things and the way average readers and average writers see things.
    Vladimir Nabokov (1899–1977)

    This unlettered man’s speaking and writing are standard English. Some words and phrases deemed vulgarisms and Americanisms before, he has made standard American; such as “It will pay.” It suggests that the one great rule of composition—and if I were a professor of rhetoric I should insist on this—is, to speak the truth. This first, this second, this third; pebbles in your mouth or not. This demands earnestness and manhood chiefly.
    Henry David Thoreau (1817–1862)

    By that bedes side ther kneleth a may,
    And she wepeth both nyght and day.

    And by that beddes side ther stondith a ston,
    Corpus Christi’wretyn theron.
    —Unknown. Corpus Christi Carol (l. 11–14)