Bayesian Information Criterion - Mathematically

Mathematically

The BIC is an asymptotic result derived under the assumptions that the data distribution is in the exponential family. Let:

  • x = the observed data;
  • n = the number of data points in x, the number of observations, or equivalently, the sample size;
  • k = the number of free parameters to be estimated. If the estimated model is a linear regression, k is the number of regressors, including the intercept;
  • p(x|k) = the probability of the observed data given the number of parameters; or, the likelihood of the parameters given the dataset;
  • L = the maximized value of the likelihood function for the estimated model.

The formula for the BIC is:

Under the assumption that the model errors or disturbances are independent and identically distributed according to a normal distribution and that the boundary condition that the derivative of the log likelihood with respect to the true variance is zero, this becomes (up to an additive constant, which depends only on n and not on the model):

where is the error variance.

The error variance in this case is defined as

One may point out from probability theory that is a biased estimator for the true variance, . Let denote the unbiased form of approximating the error variance. It is defined as

Additionally, under the assumption of normality the following version may be more tractable

Note that there is a constant added that follows from transition from log-likelihood to ; however, in using the BIC to determine the "best" model the constant becomes trivial.

Given any two estimated models, the model with the lower value of BIC is the one to be preferred. The BIC is an increasing function of and an increasing function of k. That is, unexplained variation in the dependent variable and the number of explanatory variables increase the value of BIC. Hence, lower BIC implies either fewer explanatory variables, better fit, or both. The BIC generally penalizes free parameters more strongly than does the Akaike information criterion, though it depends on the size of n and relative magnitude of n and k.

It is important to keep in mind that the BIC can be used to compare estimated models only when the numerical values of the dependent variable are identical for all estimates being compared. The models being compared need not be nested, unlike the case when models are being compared using an F or likelihood ratio test.

Read more about this topic:  Bayesian Information Criterion