Twelvefold Way - Sampling

Sampling

Another way to think of some of the cases is in terms of sampling, in statistics. Imagine a population of X items (or people), of which we choose N. Two different schemes are normally described, known as "sampling with replacement" and "sampling without replacement". In the former case (sampling with replacement), once we've chosen an item, we put it back in the population, so that we might choose it again. The result is that each choice is independent of all the other choices, and the set of samples is technically referred to as independent identically distributed. In the latter case, however, once we've chosen an item, we put it aside so that we can't choose it again. This means that the act of choosing an item has an effect on all the following choices (the particular item can't be seen again), so our choices are dependent one another.

In the terminology below, the case of sampling with replacement is termed "Any f", while the case of sampling without replacement is termed "Injective f". Each box indicates how many different sets of choices there are, in a particular sampling scheme. The row labeled "Distinct" means that the ordering matters. For example, if we have ten items, of which we choose two, then the choice (4,7) is different from (7,4). On the other hand, the row labeled "S_n orders" means that ordering doesn't matter: Choice (4,7) and (7,4) are equivalent. (Another way to think of this is to sort each choice by the item number, and throw out any duplicates that result.) In terms of probability distributions, sampling with replacement where ordering matters is comparable to describing the joint distribution of N separate random variables, each with an X-fold categorical distribution. The case where ordering doesn't matter, however, is comparable to describing a single multinomial distribution of N draws from an X-fold category, where only the number seen of each category matters. The case where ordering doesn't matter and sampling is without replacement is comparable to a single multivariate hypergeometric distribution, and the fourth possibility does not seem to have a correspondence. Note that in all the "injective" cases (i.e. sampling without replacement), must apply or the number of sets of choices is 0. ("Comparable" in the above cases means that each element of the sample space of the corresponding distribution corresponds to a separate set of choices, and hence the number in the appropriate box indicates the size of the sample space for the given distribution.)

From this perspective, the case labeled "Surjective f" is somewhat strange: Essentially, we keep sampling with replacement until we've chosen each item at least once. Then, we count how many choices we've made, and if it's not equal to N, throw out the entire set and repeat. This is vaguely comparable to the coupon collector's problem, where the process involves "collecting" (by sampling with replacement) a set of X coupons until each coupon has been seen at least once. Note that in all "surjective" cases, must apply or the number of sets of choices is 0.

Read more about this topic: Twelvefold Way