Some Notable Text Corpora
English language:
- Google N-Grams Corpus - Largest English corpus at 155 billion words. Also has corpora for other languages. (http://ngrams.googlelabs.com/datasets)
- American National Corpus
- Bank of English
- British National Corpus
- Corpus Juris Secundum
- Corpus of Contemporary American English (COCA) 425 million words, 1990-2011. Freely searchable online.
- Brown Corpus, forming part of the "Brown Family" of corpora, together with LOB, Frown and F-LOB.
- International Corpus of English
- Oxford English Corpus
- Scottish Corpus of Texts & Speech
Other languages:
- Hamshahri Corpus (Persian a.k.a. Farsi)
- Amarna letters, (for Akkadian, Egyptian, Sumerogram's, etc.)
- TEP: Tehran English-Persian Parallel Corpus (http://ece.ut.ac.ir/nlp/)
- TMC: Tehran Monolingual Corpus, Standard corpus for Persian Language Modeling (http://ece.ut.ac.ir/nlp/)
- Bijankhan Corpus A Contemporary Persian Corpus for NLP researches
- Bulgarian National Corpus (http://search.dcl.bas.bg)
- CETENFolha
- Croatian Language Corpus
- Croatian National Corpus
- Czech National Corpus
- Neo-Assyrian Text Corpus Project
- Russian National Corpus
- Slovenian National Corpus
- Thesaurus Linguae Graecae (Ancient Greek)
- Quranic Arabic Corpus (Classical Arabic)
- Eastern Armenian National Corpus (EANC) 110 million words. Freely searchable online.
- National Corpus of Polish
- German Reference Corpus (DeReKo) More than 4 billion words of contemporary written German.
- Tatoeba A parallel corpus which contains about 913000 sentences in 90 languages.
- Spanish text corpus by Molino de Ideas, which contains 660 millions words. (Spanish)
- Kotonoha Japanese language corpus
- CorALit: the Corpus of Academic Lithuanian Academic texts published in 1999-2009 (approx. 9 million words). Compiled at the University of Vilnius, Lithuania
Read more about this topic: Text Corpus
Famous quotes containing the words notable and/or text:
“In one notable instance, where the United States Army and a hundred years of persuasion failed, a highway has succeeded. The Seminole Indians surrendered to the Tamiami Trail. From the Everglades the remnants of this race emerged, soon after the trail was built, to set up their palm-thatched villages along the road and to hoist tribal flags as a lure to passing motorists.”
—For the State of Florida, U.S. public relief program (1935-1943)
“Don Pedro. But when shall we set the savage bulls horns on the sensible Benedicks head?
Claudio. Yes, and text underneath, Here dwells Benedick, the married man?”
—William Shakespeare (15641616)