Okapi BM25 - The Ranking Function

The Ranking Function

BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document, regardless of the inter-relationship between the query terms within a document (e.g., their relative proximity). It is not a single function, but actually a whole family of scoring functions, with slightly different components and parameters. One of the most prominent instantiations of the function is as follows.

Given a query, containing keywords, the BM25 score of a document is:

where is 's term frequency in the document, is the length of the document in words, and is the average document length in the text collection from which documents are drawn. and are free parameters, usually chosen, in absence of an advanced optimization, as and . is the IDF (inverse document frequency) weight of the query term . It is usually computed as:

where is the total number of documents in the collection, and is the number of documents containing .

There are several interpretations for IDF and slight variations on its formula. In the original BM25 derivation, the IDF component is derived from the Binary Independence Model.

Please note that the above formula for IDF shows potentially major drawbacks when using it for terms appearing in more than half of the corpus documents. These terms' IDF is negative, so for any two almost-identical documents, one which contains the term and one which does not contain it, the latter will possibly get a larger score. This means that terms appearing in more than half of the corpus will provide negative contributions to the final document score. This is often an undesirable behavior, so many real-world applications would deal with this IDF formula in a different way:

  • Each summand can be given a floor of 0, to trim out common terms;
  • The IDF function can be given a floor of a constant, to avoid common terms being ignored at all;
  • The IDF function can be replaced with a similarly shaped one which is non-negative, or strictly positive to avoid terms being ignored at all.

Read more about this topic:  Okapi BM25

Famous quotes containing the words ranking and/or function:

    Falsity cannot keep an idea from being beautiful; there are certain errors of such ingenuity that one could regret their not ranking among the achievements of the human mind.
    Jean Rostand (1894–1977)

    To look backward for a while is to refresh the eye, to restore it, and to render it the more fit for its prime function of looking forward.
    Margaret Fairless Barber (1869–1901)