Some Notable Text Corpora
English language:
- Google N-Grams Corpus - Largest English corpus at 155 billion words. Also has corpora for other languages. (http://ngrams.googlelabs.com/datasets)
- American National Corpus
- Bank of English
- British National Corpus
- Corpus Juris Secundum
- Corpus of Contemporary American English (COCA) 425 million words, 1990-2011. Freely searchable online.
- Brown Corpus, forming part of the "Brown Family" of corpora, together with LOB, Frown and F-LOB.
- International Corpus of English
- Oxford English Corpus
- Scottish Corpus of Texts & Speech
Other languages:
- Hamshahri Corpus (Persian a.k.a. Farsi)
- Amarna letters, (for Akkadian, Egyptian, Sumerogram's, etc.)
- TEP: Tehran English-Persian Parallel Corpus (http://ece.ut.ac.ir/nlp/)
- TMC: Tehran Monolingual Corpus, Standard corpus for Persian Language Modeling (http://ece.ut.ac.ir/nlp/)
- Bijankhan Corpus A Contemporary Persian Corpus for NLP researches
- Bulgarian National Corpus (http://search.dcl.bas.bg)
- CETENFolha
- Croatian Language Corpus
- Croatian National Corpus
- Czech National Corpus
- Neo-Assyrian Text Corpus Project
- Russian National Corpus
- Slovenian National Corpus
- Thesaurus Linguae Graecae (Ancient Greek)
- Quranic Arabic Corpus (Classical Arabic)
- Eastern Armenian National Corpus (EANC) 110 million words. Freely searchable online.
- National Corpus of Polish
- German Reference Corpus (DeReKo) More than 4 billion words of contemporary written German.
- Tatoeba A parallel corpus which contains about 913000 sentences in 90 languages.
- Spanish text corpus by Molino de Ideas, which contains 660 millions words. (Spanish)
- Kotonoha Japanese language corpus
- CorALit: the Corpus of Academic Lithuanian Academic texts published in 1999-2009 (approx. 9 million words). Compiled at the University of Vilnius, Lithuania
Read more about this topic: Text Corpus
Famous quotes containing the words notable and/or text:
“Every notable advance in technique or organization has to be paid for, and in most cases the debit is more or less equivalent to the credit. Except of course when its more than equivalent, as it has been with universal education, for example, or wireless, or these damned aeroplanes. In which case, of course, your progress is a step backwards and downwards.”
—Aldous Huxley (18941963)
“I would define the poetic effect as the capacity that a text displays for continuing to generate different readings, without ever being completely consumed.”
—Umberto Eco (b. 1932)