Contents
In its most commonly used form, the corpus consists of 14 files totaling 3,141,622 bytes as follows.
Size (bytes) | File name | Description |
---|---|---|
111,261 | BIB | ASCII text in UNIX "refer" format - 725 bibliographic references. |
768,771 | BOOK1 | unformatted ASCII text - Thomas Hardy: Far from the Madding Crowd. |
610,856 | BOOK2 | ASCII text in UNIX "troff" format - Witten: Principles of Computer Speech. |
102,400 | GEO | 32 bit numbers in IBM floating point format - seismic data. |
377,109 | NEWS | ASCII text - USENET batch file on a variety of topics. |
21,504 | OBJ1 | VAX executable program - compilation of PROGP. |
246,814 | OBJ2 | Macintosh executable program - "Knowledge Support System". |
53,161 | PAPER1 | UNIX "troff" format - Witten, Neal, Cleary: Arithmetic Coding for Data Compression. |
82,199 | PAPER2 | UNIX "troff" format - Witten: Computer (in)security. |
513,216 | PIC | 1728 x 2376 bitmap image (MSB first): text in French and line diagrams. |
39,611 | PROGC | Source code in C - UNIX compress v4.0. |
71,646 | PROGL | Source code in Lisp - system software. |
49,379 | PROGP | Source code in Pascal - program to evaluate PPM compression. |
93,695 | TRANS | ASCII and control characters - transcript of a terminal session. |
There is also a less commonly used 18 file version which include 4 additional text files in UNIX "troff" format, PAPER3 through PAPER6.
Read more about this topic: Calgary Corpus
Famous quotes containing the word contents:
“To be, contents his natural desire;
He asks no Angels wing, no Seraphs fire;
But thinks, admitted to that equal sky,
His faithful dog shall bear him company.”
—Alexander Pope (16881744)
“If one reads a newspaper only for information, one does not learn the truth, not even the truth about the paper. The truth is that the newspaper is not a statement of contents but the contents themselves; and more than that, it is an instigator.”
—Karl Kraus (18741936)
“Such as boxed
Their feelings properly, complete to tags
A box for dark men and a box for Other
Would often find the contents had been scrambled.”
—Gwendolyn Brooks (b. 1917)