Okapi BM25 - IDF Information Theoretic Interpretation

IDF Information Theoretic Interpretation

Here is an interpretation from information theory. Suppose a query term appears in documents. Then a randomly picked document will contain the term with probability (where is again the cardinality of the set of documents in the collection). Therefore, the information content of the message " contains " is:

Now suppose we have two query terms and . If the two terms occur in documents entirely independently of each other, then the probability of seeing both and in a randomly picked document is:

and the information content of such an event is:

With a small variation, this is exactly what is expressed by the IDF component of BM25.

Read more about this topic:  Okapi BM25

Famous quotes containing the word information:

    Many more children observe attitudes, values and ways different from or in conflict with those of their families, social networks, and institutions. Yet today’s young people are no more mature or capable of handling the increased conflicting and often stimulating information they receive than were young people of the past, who received the information and had more adult control of and advice about the information they did receive.
    James P. Comer (20th century)