Query Expansion - Precision and Recall Tradeoffs

Precision and Recall Tradeoffs

Search engines invoke query expansion to increase the quality of user search results. It is assumed that users do not always formulate search queries using the best terms. Best in this case may be because the database does not contain the user entered terms.

By stemming a user-entered term, more documents are matched, as the alternate word forms for a user entered term are matched as well, increasing the total recall. This comes at the expense of reducing the precision. By expanding a search query to search for the synonyms of a user entered term, the recall is also increased at the expense of precision. This is due to the nature of the equation of how precision is calculated, in that a larger recall implicitly causes a decrease in precision, given that factors of recall are part of the denominator. It is also inferred that a larger recall negatively impacts overall search result quality, given that many users do not want more results to comb through, regardless of the precision.

The goal of query expansion in this regard is by increasing recall, precision can potentially increase (rather than decrease as mathematically equated), by including in the result set pages which are more relevant (of higher quality), or at least equally relevant. Pages which would not be included in the result set, which have the potential to be more relevant to the user's desired query, are included, and without query expansion would not have, regardless of relevance. At the same time, many of the current commercial search engines use word frequency (Tf-idf) to assist in ranking. By ranking the occurrences of both the user entered words and synonyms and alternate morphological forms, documents with a higher density (high frequency and close proximity) tend to migrate higher up in the search results, leading to a higher quality of the search results near the top of the results, despite the larger recall.

This tradeoff is one of the defining problems in query expansion, regarding whether it is worthwhile to perform given the questionable effects on precision and recall. Critics state one of the problems is that the dictionaries and thesauri, and the stemming algorithm, are driven by human bias and while this is implicitly handled by the query expansion algorithm, this explicitly affects the results in a non-automated manner (similar to how statisticians can 'lie' with statistics). Other critics point out potential for corporate influence on the dictionaries, promoting advertising of online web pages in the case of web search engines.

Read more about this topic:  Query Expansion

Famous quotes containing the words precision and/or recall:

    We are often struck by the force and precision of style to which hard-working men, unpracticed in writing, easily attain when required to make the effort. As if plainness and vigor and sincerity, the ornaments of style, were better learned on the farm and in the workshop than in the schools. The sentences written by such rude hands are nervous and tough, like hardened thongs, the sinews of the deer, or the roots of the pine.
    Henry David Thoreau (1817–1862)

    Baltimore lay very near the immense protein factory of Chesapeake Bay, and out of the bay it ate divinely. I well recall the time when prime hard crabs of the channel species, blue in color, at least eight inches in length along the shell, and with snow-white meat almost as firm as soap, were hawked in Hollins Street of Summer mornings at ten cents a dozen.
    —H.L. (Henry Lewis)