Bloom Filter - Probability of False Positives

Probability of False Positives

Assume that a hash function selects each array position with equal probability. If m is the number of bits in the array, and k is the number of hash functions, then the probability that a certain bit is not set to 1 by a certain hash function during the insertion of an element is then

The probability that it is not set to 1 by any of the hash functions is

If we have inserted n elements, the probability that a certain bit is still 0 is

the probability that it is 1 is therefore

Now test membership of an element that is not in the set. Each of the k array positions computed by the hash functions is 1 with a probability as above. The probability of all of them being 1, which would cause the algorithm to erroneously claim that the element is in the set, is often given as

This is not strictly correct as it assumes independence for the probabilities of each bit being set. However, assuming it is a close approximation we have that the probability of false positives decreases as m (the number of bits in the array) increases, and increases as n (the number of inserted elements) increases. For a given m and n, the value of k (the number of hash functions) that minimizes the probability is

which gives the false positive probability of

The required number of bits m, given n (the number of inserted elements) and a desired false positive probability p (and assuming the optimal value of k is used) can be computed by substituting the optimal value of k in the probability expression above:

which can be simplified to:

This results in:

This means that for a given false positive probability p, the length of a Bloom filter m is proportionate to the number of elements being filtered n. While the above formula is asymptotic (i.e. applicable as m,n → ∞), the agreement with finite values of m,n is also quite good; the false positive probability for a finite bloom filter with m bits, n elements, and k hash functions is at most

So we can use the asymptotic formula if we pay a penalty for at most half an extra element and at most one fewer bit.

Read more about this topic:  Bloom Filter

Famous quotes containing the words probability of, probability and/or false:

    Liberty is a blessing so inestimable, that, wherever there appears any probability of recovering it, a nation may willingly run many hazards, and ought not even to repine at the greatest effusion of blood or dissipation of treasure.
    David Hume (1711–1776)

    The source of Pyrrhonism comes from failing to distinguish between a demonstration, a proof and a probability. A demonstration supposes that the contradictory idea is impossible; a proof of fact is where all the reasons lead to belief, without there being any pretext for doubt; a probability is where the reasons for belief are stronger than those for doubting.
    Andrew Michael Ramsay (1686–1743)

    All that we call ideal in Greek or any other art, because to us it is false and visionary, was, to the makers of it, true and existent.
    John Ruskin (1819–1900)