Research Collections
The IRF provides a number of test data collections that have either been developed by the IRF, by one of its members or by third parties. These data collections can be used freely for scientific experimentations.
The MAtrixware REsearch Collection (MAREC) is the first standardised patent data corpus for research purposes. It consists of 19 million patent documents in different languages, normalised to a highly specific XML format. The collection has been developed by Matrixware for the IRF.
The ClueWeb09 collection is a 25 terabyte dataset of about 1 billion web pages crawled in January and February, 2009. It has been created by the Language Technologies Institute at Carnegie Mellon University to support research on information retrieval and related human language technologies.
Read more about this topic: Information Retrieval Facility
Famous quotes containing the words research and/or collections:
“One of the most important findings to come out of our research is that being where you want to be is good for you. We found a very strong correlation between preferring the role you are in and well-being. The homemaker who is at home because she likes that job, because it meets her own desires and needs, tends to feel good about her life. The woman at work who wants to be there also rates high in well-being.”
—Grace Baruch (20th century)
“Most of those who make collections of verse or epigram are like men eating cherries or oysters: they choose out the best at first, and end by eating all.”
—Sébastien-Roch Nicolas De Chamfort (17411794)