Chinese Restaurant Process - Generalization

Generalization

This construction can be generalized to a model with two parameters, α and θ, commonly called the discount and strength (or concentration) parameters. At time n + 1, the next customer to arrive finds |B| occupied tables and decides to sit at an empty table with probability


\dfrac{\theta + |B| \alpha}{n + \theta},

or at an occupied table b of size |b| with probability


\dfrac{|b| - \alpha}{n + \theta}.

In order for the construction to define a valid probability measure it is necessary to suppose that either α < 0 and θ = - Lα for some L ∈ {1, 2, ...}; or that 0 ≤ α ≤ 1 and θ > −α.

Under this model the probability assigned to any particular partition B of n, in terms of the Pochhammer k-symbol, is


\Pr(B_n = B) = \dfrac{(\theta + \alpha)_{|B|-1, \alpha}}{(\theta+1)_{n-1, 1}} \prod_{b\in B}(1-\alpha)_{|b|-1, 1}

where, by convention, and for


(a)_{b,c} = \prod_{i=0}^{b-1}(a+ic) =
\begin{cases}
a^b & \text{if }c = 0, \\ \\
\dfrac{c^b\,\Gamma(a/c + b)}{\Gamma(a/c)} & \text{otherwise}.
\end{cases}

Thus, for the case when the partition probability can be expressed in terms of the Gamma function as


\Pr(B_n = B) =\dfrac{\Gamma(\theta)}{\Gamma(\theta+n)}\dfrac{\alpha^{|B|}\,\Gamma(\theta/\alpha + |B|) }{\Gamma(\theta/\alpha)}\prod_{b\in B}\dfrac{\Gamma(|b|-\alpha)}{\Gamma(1-\alpha)}.

In the one-parameter case, where is zero, this simplifies to


\Pr(B_n = B) = \dfrac{\Gamma(\theta)\,\theta^{|B|}}{\Gamma(\theta+n)}\prod_{b\in B} \Gamma(|b|).

Or, when is zero,


\Pr(B_n = B) =\dfrac{\alpha^{|B|-1}\,\Gamma(|B|) }{{\Gamma(n)}}\prod_{b\in B}\dfrac{\Gamma(|b|-\alpha)}{\Gamma(1-\alpha)}.

As before, the probability assigned to any particular partition depends only on the block sizes, so as before the random partition is exchangeable in the sense described above. The consistency property still holds, as before, by construction.

If α = 0, the probability distribution of the random partition of the integer n thus generated is the Ewens distribution with parameter θ, used in population genetics and the unified neutral theory of biodiversity.

Read more about this topic:  Chinese Restaurant Process