Full Text Search - False-positive Problem

False-positive Problem

Free text searching is likely to retrieve many documents that are not relevant to the intended search question. Such documents are called false positives (see Type I error). The retrieval of irrelevant documents is often caused by the inherent ambiguity of natural language. In the sample diagram at right, false positives are represented by the irrelevant results (red dots) that were returned by the search (on a light-blue background).

Clustering techniques based on Bayesian algorithms can help reduce false positives. For a search term of "football", clustering can be used to categorize the document/data universe into "American football", "corporate football", etc. Depending on the occurrences of words relevant to the categories, search terms a search result can be placed in one or more of the categories. This technique is being extensively deployed in the e-discovery domain.

Read more about this topic:  Full Text Search

Famous quotes containing the word problem:

    Will women find themselves in the same position they have always been? Or do we see liberation as solving the conditions of women in our society?... If we continue to shy away from this problem we will not be able to solve it after independence. But if we can say that our first priority is the emancipation of women, we will become free as members of an oppressed community.
    Ruth Mompati (b. 1925)