Regression Analysis - Linear Regression

Linear Regression

In linear regression, the model specification is that the dependent variable, is a linear combination of the parameters (but need not be linear in the independent variables). For example, in simple linear regression for modeling data points there is one independent variable:, and two parameters, and :

straight line:

(In multiple linear regression, there are several independent variables or functions of independent variables.)

Adding a term in xi2 to the preceding regression gives:

parabola:

This is still linear regression; although the expression on the right hand side is quadratic in the independent variable, it is linear in the parameters, and

In both cases, is an error term and the subscript indexes a particular observation. Given a random sample from the population, we estimate the population parameters and obtain the sample linear regression model:

The residual, is the difference between the value of the dependent variable predicted by the model, and the true value of the dependent variable, . One method of estimation is ordinary least squares. This method obtains parameter estimates that minimize the sum of squared residuals, SSE, also sometimes denoted RSS:

Minimization of this function results in a set of normal equations, a set of simultaneous linear equations in the parameters, which are solved to yield the parameter estimators, .

In the case of simple regression, the formulas for the least squares estimates are

where is the mean (average) of the values and is the mean of the values. See simple linear regression for a derivation of these formulas and a numerical example. Under the assumption that the population error term has a constant variance, the estimate of that variance is given by:

This is called the mean square error (MSE) of the regression. The standard errors of the parameter estimates are given by

Under the further assumption that the population error term is normally distributed, the researcher can use these estimated standard errors to create confidence intervals and conduct hypothesis tests about the population parameters.

Read more about this topic:  Regression Analysis