Research Collections
The IRF provides a number of test data collections that have either been developed by the IRF, by one of its members or by third parties. These data collections can be used freely for scientific experimentations.
The MAtrixware REsearch Collection (MAREC) is the first standardised patent data corpus for research purposes. It consists of 19 million patent documents in different languages, normalised to a highly specific XML format. The collection has been developed by Matrixware for the IRF.
The ClueWeb09 collection is a 25 terabyte dataset of about 1 billion web pages crawled in January and February, 2009. It has been created by the Language Technologies Institute at Carnegie Mellon University to support research on information retrieval and related human language technologies.
Read more about this topic: Information Retrieval Facility
Famous quotes containing the words research and/or collections:
“Feeling that you have to be the perfect parent places a tremendous and completely unnecessary burden on you. If weve learned anything from the past half-centurys research on child development, its that children are remarkably resilient. You can make lots of mistakes and still wind up with great kids.”
—Lawrence Kutner (20th century)
“Most of those who make collections of verse or epigram are like men eating cherries or oysters: they choose out the best at first, and end by eating all.”
—Sébastien-Roch Nicolas De Chamfort (17411794)