# Ordinary Least Squares - Large Sample Properties

Large Sample Properties

The least squares estimators are point estimates of the linear regression model parameters β. However generally we also want to know how close those estimates might be to the true values of parameters. In other words, we want to construct the interval estimates.

Since we haven't made any assumption about the distribution of error term εi, it is impossible to infer the distribution of the estimators and . Nevertheless, we can apply the law of large numbers and central limit theorem to derive their asymptotic properties as sample size n goes to infinity. Now of course in practice sample size doesn't go anywhere, however it is customary to pretend that n is "large enough" so that the true distribution of OLS estimator is close to its asymptotic limit, and the former may be approximately replaced by the latter.

We can show that under the model assumptions, least squares estimator for β is consistent (that is converges in probability to β) and asymptotically normal:

where

Using this asymptotic distribution, approximate two-sided confidence intervals for the j-th component of the vector can be constructed as $\beta_j \in \bigg[\ \hat\beta_j \pm q^{\mathcal{N}(0,1)}_{1-\alpha/2}\!\sqrt{\tfrac{1}{n}\hat\sigma^2\big_{jj}} \ \bigg]$ at the 1 − α confidence level,

where q denotes the quantile function of standard normal distribution, and jj is the j-th diagonal element of a matrix.

Similarly, the least squares estimator for σ2 is also consistent and asymptotically normal (provided that the fourth moment of εi exists) with limiting distribution

These asymptotic distributions can be used for prediction, testing hypotheses, constructing other estimators, etc.. As an example consider the problem of prediction. Suppose is some point within the domain of distribution of the regressors, and one wants to know what the response variable would have been at that point. The mean response is the quantity, whereas the predicted response is . Clearly the predicted response is a random variable, its distribution can be derived from that of :

which allows construct confidence intervals for mean response to be constructed:

at the 1 − α confidence level.