Brown Corpus - Sample Distribution

Sample Distribution

The Corpus consists of 500 samples, distributed across 15 genres in rough proportion to the amount published in 1961 in each of those genres. All works sampled were published in 1961; as far as could be determined they were first published then, and were written by native speakers of American English.

Each sample began at a random sentence-boundary in the article or other unit chosen, and continued up to the first sentence boundary after 2,000 words. In a very few cases miscounts led to samples being just under 2,000 words.

The original data entry was done on upper-case only keypunch machines; capitals were indicated by a preceding asterisk, and various special items such as formulae also had special codes.

The corpus originally (1961) contained 1,014,312 words sampled from 15 text categories:

  • A. PRESS: Reportage (44 texts)
    • Political
    • Sports
    • Society
    • Spot News
    • Financial
    • Cultural
  • B. PRESS: Editorial (27 texts)
    • Institutional Daily
    • Personal
    • Letters to the Editor
  • C. PRESS: Reviews (17 texts)
    • theatre
    • books
    • music
    • dance
  • D. RELIGION (17 texts)
    • Books
    • Periodicals
    • Tracts
  • E. SKILL AND HOBBIES (36 texts)
    • Books
    • Periodicals
  • F. POPULAR LORE (48 texts)
    • Books
    • Periodicals
  • G. BELLES-LETTRES - Biography, Memoirs, etc. (75 texts)
    • Books
    • Periodicals
  • H. MISCELLANEOUS: US Government & House Organs (30 texts)
    • Government Documents
    • Foundation Reports
    • Industry Reports
    • College Catalog
    • Industry House organ
  • J. LEARNED (80 texts)
    • Natural Sciences
    • Medicine
    • Mathematics
    • Social and Behavioral Sciences
    • Political Science, Law, Education
    • Humanities
    • Technology and Engineering
  • K. FICTION: General (29 texts)
    • Novels
    • Short Stories
  • L. FICTION: Mystery and Detective Fiction (24 texts)
    • Novels
    • Short Stories
  • M. FICTION: Science (6 texts)
    • Novels
    • Short Stories
  • N. FICTION: Adventure and Western (29 texts)
    • Novels
    • Short Stories
  • P. FICTION: Romance and Love Story (29 texts)
    • Novels
    • Short Stories
  • R. HUMOR (9 texts)
    • Novels
    • Essays, etc.

Read more about this topic:  Brown Corpus

Famous quotes containing the words sample and/or distribution:

    As a rule they will refuse even to sample a foreign dish, they regard such things as garlic and olive oil with disgust, life is unliveable to them unless they have tea and puddings.
    George Orwell (1903–1950)

    The man who pretends that the distribution of income in this country reflects the distribution of ability or character is an ignoramus. The man who says that it could by any possible political device be made to do so is an unpractical visionary. But the man who says that it ought to do so is something worse than an ignoramous and more disastrous than a visionary: he is, in the profoundest Scriptural sense of the word, a fool.
    George Bernard Shaw (1856–1950)