IDF Information Theoretic Interpretation
Here is an interpretation from information theory. Suppose a query term appears in documents. Then a randomly picked document will contain the term with probability (where is again the cardinality of the set of documents in the collection). Therefore, the information content of the message " contains " is:
Now suppose we have two query terms and . If the two terms occur in documents entirely independently of each other, then the probability of seeing both and in a randomly picked document is:
and the information content of such an event is:
With a small variation, this is exactly what is expressed by the IDF component of BM25.
Read more about this topic: Okapi BM25
Famous quotes containing the word information:
“So while it is true that children are exposed to more information and a greater variety of experiences than were children of the past, it does not follow that they automatically become more sophisticated. We always know much more than we understand, and with the torrent of information to which young people are exposed, the gap between knowing and understanding, between experience and learning, has become even greater than it was in the past.”
—David Elkind (20th century)