Regression Estimation - Underlying Assumptions

Underlying Assumptions

Classical assumptions for regression analysis include:

  • The sample is representative of the population for the inference prediction.
  • The error is a random variable with a mean of zero conditional on the explanatory variables.
  • The independent variables are measured with no error. (Note: If this is not so, modeling may be done instead using errors-in-variables model techniques).
  • The predictors are linearly independent, i.e. it is not possible to express any predictor as a linear combination of the others.
  • The errors are uncorrelated, that is, the variance–covariance matrix of the errors is diagonal and each non-zero element is the variance of the error.
  • The variance of the error is constant across observations (homoscedasticity). If not, weighted least squares or other methods might instead be used.

These are sufficient conditions for the least-squares estimator to possess desirable properties; in particular, these assumptions imply that the parameter estimates will be unbiased, consistent, and efficient in the class of linear unbiased estimators. It is important to note that actual data rarely satisfies the assumptions. That is, the method is used even though the assumptions are not true. Variation from the assumptions can sometimes be used as a measure of how far the model is from being useful. Many of these assumptions may be relaxed in more advanced treatments. Reports of statistical analyses usually include analyses of tests on the sample data and methodology for the fit and usefulness of the model.

Assumptions include the geometrical support of the variables. Independent and dependent variables often refer to values measured at point locations. There may be spatial trends and spatial autocorrelation in the variables that violates statistical assumptions of regression. Geographic weighted regression is one technique to deal with such data. Also, variables may include values aggregated by areas. With aggregated data the modifiable areal unit problem can cause extreme variation in regression parameters. When analyzing data aggregated by political boundaries, postal codes or census areas results may be very different with a different choice of units.

Read more about this topic:  Regression Estimation

Famous quotes containing the words underlying and/or assumptions:

    Sport in the sense of a mass-spectacle, with death to add to the underlying excitement, comes into existence when a population has been drilled and regimented and depressed to such an extent that it needs at least a vicarious participation in difficult feats of strength or skill or heroism in order to sustain its waning life-sense.
    Lewis Mumford (1895–1990)

    All of the assumptions once made about a parent’s role have been undercut by the specialists. The psychiatric specialists, the psychological specialists, the educational specialists, all have mystified child development. They have fostered the idea that understanding children and promoting their intellectual well-being is too complex for mothers and requires the intervention of experts.
    Elaine Heffner (20th century)