Omitted-variable Bias in Linear Regression
Two conditions must hold true for omitted-variable bias to exist in linear regression:
- the omitted variable must be a determinant of the dependent variable (i.e., its true regression coefficient is not zero); and
- the omitted variable must be correlated with one or more of the included independent variables.
As an example, consider a linear model of the form
where
- xi is a 1 × p row vector, and is part of the observed data;
- β is a p × 1 column vector of unobservable parameters to be estimated;
- zi is a scalar and is part of the observed data;
- δ is a scalar and is an unobservable parameter to be estimated;
- the error terms ui are unobservable random variables having expected value 0 (conditionally on xi and zi);
- the dependent variables yi are part of the observed data.
We let
and
Then through the usual least squares calculation, the estimated parameter vector based only on the observed x-values but omitting the observed z values, is given by:
(where the "prime" notation means the transpose of a matrix).
Substituting for Y based on the assumed linear model,
On taking expectations, the contribution of the final term is zero; this follows from the assumption that U has zero expectation. On simplifying the remaining terms:
The second term above is the omitted-variable bias in this case. Note that the bias is equal to the weighted portion of zi which is "explained" by xi.
Read more about this topic: Omitted-variable Bias
Famous quotes containing the word bias:
“The solar system has no anxiety about its reputation, and the credit of truth and honesty is as safe; nor have I any fear that a skeptical bias can be given by leaning hard on the sides of fate, of practical power, or of trade, which the doctrine of Faith cannot down-weigh.”
—Ralph Waldo Emerson (18031882)