Noncentral Hypergeometric Distributions

Noncentral Hypergeometric Distributions

In statistics, the hypergeometric distribution is the discrete probability distribution generated by picking colored balls at random from an urn without replacement.

Various generalizations to this distribution exist for cases where the picking of colored balls is biased so that balls of one color are more likely to be picked than balls of another color.

This can be illustrated by the following example. Assume that an opinion poll is conducted by calling random telephone numbers. Unemployed people are more likely to be home and answer the phone than employed people are. Therefore, unemployed respondents are likely to be over-represented in the sample. The probability distribution of employed versus unemployed respondents in a sample of n respondents can be described as a noncentral hypergeometric distribution.

The description of biased urn models is complicated by the fact that there is more than one noncentral hypergeometric distribution. Which distribution you get depends on whether items (e.g. colored balls) are sampled one by one in a manner where there is competition between the items, or they are sampled independently of each other.

There is widespread confusion about this fact. The name noncentral hypergeometric distribution has been used for two different distributions, and several scientists have used the wrong distribution or erroneously believed that the two distributions were identical.

The use of the same name for two different distributions has been possible because these two distributions were studied by two different groups of scientists with hardly any contact with each other.

Agner Fog (2007, 2008) has suggested that the best way to avoid confusion is to use the name Wallenius' noncentral hypergeometric distribution for the distribution of a biased urn model where a predetermined number of items are drawn one by one in a competitive manner, while the name Fisher's noncentral hypergeometric distribution is used where items are drawn independently of each other, so that the total number of items drawn is known only after the experiment. The names refer to Kenneth Ted Wallenius and R. A. Fisher who were the first to describe the respective distributions.

Fisher's noncentral hypergeometric distribution has previously been given the name extended hypergeometric distribution, but this name is rarely used in the scientific literature, except in handbooks that need to distinguish between the two distributions. Some scientists are strongly opposed to using this name.

A thorough explanation of the difference between the two noncentral hypergeometric distributions is obviously needed here.

Read more about Noncentral Hypergeometric Distributions:  Wallenius' Noncentral Hypergeometric Distribution, Fisher's Noncentral Hypergeometric Distribution, The Difference Between The Two Noncentral Hypergeometric Distributions, Examples, See Also