Okapi BM25 - IDF Information Theoretic Interpretation

IDF Information Theoretic Interpretation

Here is an interpretation from information theory. Suppose a query term appears in documents. Then a randomly picked document will contain the term with probability (where is again the cardinality of the set of documents in the collection). Therefore, the information content of the message " contains " is:

Now suppose we have two query terms and . If the two terms occur in documents entirely independently of each other, then the probability of seeing both and in a randomly picked document is:

and the information content of such an event is:

With a small variation, this is exactly what is expressed by the IDF component of BM25.

Read more about this topic:  Okapi BM25

Famous quotes containing the word information:

    I am the very pattern of a modern Major-Gineral,
    I’ve information vegetable, animal, and mineral;
    I know the kings of England, and I quote the fights historical,
    From Marathon to Waterloo, in order categorical;
    Sir William Schwenck Gilbert (1836–1911)