IDF Information Theoretic Interpretation
Here is an interpretation from information theory. Suppose a query term appears in documents. Then a randomly picked document will contain the term with probability (where is again the cardinality of the set of documents in the collection). Therefore, the information content of the message " contains " is:
Now suppose we have two query terms and . If the two terms occur in documents entirely independently of each other, then the probability of seeing both and in a randomly picked document is:
and the information content of such an event is:
With a small variation, this is exactly what is expressed by the IDF component of BM25.
Read more about this topic: Okapi BM25
Famous quotes containing the word information:
“I believe it has been said that one copy of The Times contains more useful information than the whole of the historical works of Thucydides.”
—Richard Cobden (18041865)