Document Retrieval - Variations

Variations

There are two main classes of indexing schemata for document retrieval systems: form based (or word based), and content based indexing. The document classification scheme (or indexing algorithm) in use determines the nature of the document retrieval system.

Form based document retrieval addresses the exact syntactic properties of a text, comparable to substring matching in string searches. The text is generally unstructured and not necessarily in a natural language, the system could for example be used to process large sets of chemical representations in molecular biology. A suffix tree algorithm is an example for form based indexing.

The content based approach exploits semantic connections between documents and parts thereof, and semantic connections between queries and documents. Most content based document retrieval systems use an inverted index algorithm.

Read more about this topic:  Document Retrieval

Famous quotes containing the word variations:

    I may be able to spot arrowheads on the desert but a refrigerator is a jungle in which I am easily lost. My wife, however, will unerringly point out that the cheese or the leftover roast is hiding right in front of my eyes. Hundreds of such experiences convince me that men and women often inhabit quite different visual worlds. These are differences which cannot be attributed to variations in visual acuity. Man and women simply have learned to use their eyes in very different ways.
    Edward T. Hall (b. 1914)