Calgary Corpus - Contents

Contents

In its most commonly used form, the corpus consists of 14 files totaling 3,141,622 bytes as follows.

Size (bytes) File name Description
111,261 BIB ASCII text in UNIX "refer" format - 725 bibliographic references.
768,771 BOOK1 unformatted ASCII text - Thomas Hardy: Far from the Madding Crowd.
610,856 BOOK2 ASCII text in UNIX "troff" format - Witten: Principles of Computer Speech.
102,400 GEO 32 bit numbers in IBM floating point format - seismic data.
377,109 NEWS ASCII text - USENET batch file on a variety of topics.
21,504 OBJ1 VAX executable program - compilation of PROGP.
246,814 OBJ2 Macintosh executable program - "Knowledge Support System".
53,161 PAPER1 UNIX "troff" format - Witten, Neal, Cleary: Arithmetic Coding for Data Compression.
82,199 PAPER2 UNIX "troff" format - Witten: Computer (in)security.
513,216 PIC 1728 x 2376 bitmap image (MSB first): text in French and line diagrams.
39,611 PROGC Source code in C - UNIX compress v4.0.
71,646 PROGL Source code in Lisp - system software.
49,379 PROGP Source code in Pascal - program to evaluate PPM compression.
93,695 TRANS ASCII and control characters - transcript of a terminal session.

There is also a less commonly used 18 file version which include 4 additional text files in UNIX "troff" format, PAPER3 through PAPER6.

Read more about this topic:  Calgary Corpus

Famous quotes containing the word contents:

    Such as boxed
    Their feelings properly, complete to tags
    A box for dark men and a box for Other
    Would often find the contents had been scrambled.
    Gwendolyn Brooks (b. 1917)

    If one reads a newspaper only for information, one does not learn the truth, not even the truth about the paper. The truth is that the newspaper is not a statement of contents but the contents themselves; and more than that, it is an instigator.
    Karl Kraus (1874–1936)

    Yet to speak of the whole world as metaphor
    Is still to stick to the contents of the mind
    And the desire to believe in a metaphor.
    It is to stick to the nicer knowledge of
    Belief, that what it believes in is not true.
    Wallace Stevens (1879–1955)