Brown Corpus - Sample Distribution

Sample Distribution

The Corpus consists of 500 samples, distributed across 15 genres in rough proportion to the amount published in 1961 in each of those genres. All works sampled were published in 1961; as far as could be determined they were first published then, and were written by native speakers of American English.

Each sample began at a random sentence-boundary in the article or other unit chosen, and continued up to the first sentence boundary after 2,000 words. In a very few cases miscounts led to samples being just under 2,000 words.

The original data entry was done on upper-case only keypunch machines; capitals were indicated by a preceding asterisk, and various special items such as formulae also had special codes.

The corpus originally (1961) contained 1,014,312 words sampled from 15 text categories:

  • A. PRESS: Reportage (44 texts)
    • Political
    • Sports
    • Society
    • Spot News
    • Financial
    • Cultural
  • B. PRESS: Editorial (27 texts)
    • Institutional Daily
    • Personal
    • Letters to the Editor
  • C. PRESS: Reviews (17 texts)
    • theatre
    • books
    • music
    • dance
  • D. RELIGION (17 texts)
    • Books
    • Periodicals
    • Tracts
  • E. SKILL AND HOBBIES (36 texts)
    • Books
    • Periodicals
  • F. POPULAR LORE (48 texts)
    • Books
    • Periodicals
  • G. BELLES-LETTRES - Biography, Memoirs, etc. (75 texts)
    • Books
    • Periodicals
  • H. MISCELLANEOUS: US Government & House Organs (30 texts)
    • Government Documents
    • Foundation Reports
    • Industry Reports
    • College Catalog
    • Industry House organ
  • J. LEARNED (80 texts)
    • Natural Sciences
    • Medicine
    • Mathematics
    • Social and Behavioral Sciences
    • Political Science, Law, Education
    • Humanities
    • Technology and Engineering
  • K. FICTION: General (29 texts)
    • Novels
    • Short Stories
  • L. FICTION: Mystery and Detective Fiction (24 texts)
    • Novels
    • Short Stories
  • M. FICTION: Science (6 texts)
    • Novels
    • Short Stories
  • N. FICTION: Adventure and Western (29 texts)
    • Novels
    • Short Stories
  • P. FICTION: Romance and Love Story (29 texts)
    • Novels
    • Short Stories
  • R. HUMOR (9 texts)
    • Novels
    • Essays, etc.

Read more about this topic:  Brown Corpus

Famous quotes containing the words sample and/or distribution:

    All that a city will ever allow you is an angle on it—an oblique, indirect sample of what it contains, or what passes through it; a point of view.
    Peter Conrad (b. 1948)

    Classical and romantic: private language of a family quarrel, a dead dispute over the distribution of emphasis between man and nature.
    Cyril Connolly (1903–1974)