Binomial Regression - Comparison Between Binomial Regression and Binary Choice Models

Comparison Between Binomial Regression and Binary Choice Models

A binary choice model assumes a latent variable Un, the utility (or net benefit) that person n obtains from taking an action (as opposed to not taking the action). The utility the person obtains from taking the action depends on the characteristics of the person, some of which are observed by the researcher and some are not:

where is a set of regression coefficients and is a set of independent variables (also known as "features") describing person n, which may be either discrete "dummy variables" or regular continuous variables. is a random variable specifying "noise" or "error" in the prediction, assumed to be distributed according to some distribution. Normally, if there is a mean or variance parameter in the distribution, it cannot be identified, so the parameters are set to convenient values — by convention usually mean 0, variance 1.

The person takes the action, yn = 1, if Un > 0. The unobserved term, εn, is assumed to have a logistic distribution.

The specification is written succinctly as:

    • Un = βsn + εn
    •  Y_n = \begin{cases}
1, & if \, U_n > 0, \\
0, & if \, U_n \le 0
\end{cases}
    • ε ∼ logistic, standard normal, etc.

Let us write it slightly differently:

    • Un = βsn - en
    •  Y_n = \begin{cases}
1, & if \, U_n > 0, \\
0, & if \, U_n \le 0
\end{cases}
    • e ∼ logistic, standard normal, etc.

Here we have made the substitution en = -εn. This changes a random variable into a slightly different one, defined over a negated domain. As it happens, the error distributions we usually consider (e.g. logistic distribution, standard normal distribution, standard Student's t-distribution, etc.) are symmetric about 0, and hence the distribution over en is identical to the distribution over εn.

Denote the cumulative distribution function (CDF) of as and the quantile function (inverse CDF) of as

Note that


\begin{align}
\Pr(Y_n=1) &= \Pr(U_n > 0) \\
&= \Pr(\boldsymbol\beta \cdot \mathbf{s_n} - e_n > 0) \\
&= \Pr(-e_n > -\boldsymbol\beta \cdot \mathbf{s_n}) \\
&= \Pr(e_n \le \boldsymbol\beta \cdot \mathbf{s_n}) \\
&= F_e(\boldsymbol\beta \cdot \mathbf{s_n})
\end{align}

Since Y_n is a Bernoulli trial, where we have

or equivalently

Note that this is exactly equivalent to the binomial regression model expressed in the formalism of the generalized linear model.

If i.e. distributed as a standard normal distribution, then

which is exactly a probit model.

If i.e. distributed as a standard logistic distribution with mean 0 and scale parameter 1, then the corresponding quantile function is the logit function, and

which is exactly a logit model.

Note that the two different formalisms — generalized linear models (GLM's) and discrete choice models — are equivalent in the case of simple binary choice models, but can be exteneded if differing ways:

  • GLM's can easily handle arbitrarily distributed response variables (dependent variables), not just categorical variables or ordinal variables, which discrete choice models are limited to by their nature. GLM's are also not limited to link functions that are quantile functions of some distribution, unlike the use of an error variable, which must by assumption have a probability distribution.
  • On the other hand, because discrete choice models are described as types of generative models, it is conceptually easier to extend them to complicated situations with multiple, possibly correlated, choices for each person, or other variations.

Read more about this topic:  Binomial Regression

Famous quotes containing the words comparison, choice and/or models:

    Intolerance respecting other people’s religion is toleration itself in comparison with intolerance respecting other people’s art.
    Wallace Stevens (1879–1955)

    The question of place and climate is most closely related to the question of nutrition. Nobody is free to live everywhere; and whoever has to solve great problems that challenge all his strength actually has a very restricted choice in this matter. The influence of climate on our metabolism, its retardation, its acceleration, goes so far that a mistaken choice of place and climate can not only estrange a man from his task but can actually keep it from him: he never gets to see it.
    Friedrich Nietzsche (1844–1900)

    ... your problem is your role models were models.
    Jane Wagner (b. 1935)