Information Retrieval Facility - Research Collections

Research Collections

The IRF provides a number of test data collections that have either been developed by the IRF, by one of its members or by third parties. These data collections can be used freely for scientific experimentations.

The MAtrixware REsearch Collection (MAREC) is the first standardised patent data corpus for research purposes. It consists of 19 million patent documents in different languages, normalised to a highly specific XML format. The collection has been developed by Matrixware for the IRF.

The ClueWeb09 collection is a 25 terabyte dataset of about 1 billion web pages crawled in January and February, 2009. It has been created by the Language Technologies Institute at Carnegie Mellon University to support research on information retrieval and related human language technologies.

Read more about this topic:  Information Retrieval Facility

Famous quotes containing the words research and/or collections:

    The research on gender and morality shows that women and men looked at the world through very different moral frameworks. Men tend to think in terms of “justice” or absolute “right and wrong,” while women define morality through the filter of how relationships will be affected. Given these basic differences, why would men and women suddenly agree about disciplining children?
    Ron Taffel (20th century)

    Most of those who make collections of verse or epigram are like men eating cherries or oysters: they choose out the best at first, and end by eating all.
    —Sébastien-Roch Nicolas De Chamfort (1741–1794)