Probit Model - Gibbs Sampling

Gibbs Sampling

Gibbs sampling of a probit model is possible because regression models typically use normal prior distributions over the weights, and this distribution is conjugate with the normal distribution of the errors (and hence of the latent variablesY*). The model can be described as


\begin{array}{lcl}
\boldsymbol\beta &\sim& \mathcal{N}(\mathbf{b}_0, \mathbf{B}_0) \\
y_i^\ast|\mathbf{x}_i,\boldsymbol\beta &\sim& \mathcal{N}(\mathbf{x}'_i\boldsymbol\beta, 1) \\
y_i &=& \begin{cases} 1 & \text{if } y_i^\ast > 0 \\ 0 & \text{otherwise} \end{cases}
\end{array}

From this, we can determine the full conditional densities needed:


\begin{array}{lcl}
\mathbf{B} &=& (\mathbf{B}_0^{-1} + \mathbf{X}'\mathbf{X})^{-1} \\
\boldsymbol\beta|\mathbf{y}^\ast &\sim& \mathcal{N}(\mathbf{B}(\mathbf{B}_0^{-1}\mathbf{b}_0 + \mathbf{X}'\mathbf{y}^\ast), \mathbf{B}) \\
y_i^\ast|y_i=0,\mathbf{x}_i,\boldsymbol\beta &\sim& \mathcal{N}(\mathbf{x}'_i\boldsymbol\beta, 1) \\
y_i^\ast|y_i=1,\mathbf{x}_i,\boldsymbol\beta &\sim& \mathcal{N}(\mathbf{x}'_i\boldsymbol\beta, 1)
\end{array}

The result for β is given in the article on Bayesian linear regression, although specified with different notation.

The only trickiness is in the last two equations. The notation is the Iverson bracket, sometimes written or similar. It indicates that the distribution must be truncated within the given range, and rescaled appropriately. In this particular case, a truncated normal distribution arises. Sampling from this distribution depends on how much is truncated. If a large fraction of the original mass remains, sampling can be easily done with rejection sampling — simply sample a number from the non-truncated distribution, and reject it if it falls outside the restriction imposed by the truncation. If sampling from only a small fraction of the original mass, however (e.g. if sampling from one of the tails of the normal distribution — for example if is around 3 or more, and a negative sample is desired), then this will be inefficient and it becomes necessary to fall back on other sampling algorithms. General sampling from the truncated normal can be achieved using approximations to the normal CDF and the probit function, and R has a function rtnorm for generating truncated-normal samples.

Read more about this topic:  Probit Model