Data Transformation (statistics) - Data Transformation in Regression

Data Transformation in Regression

Linear regression is a statistical technique for relating a dependent variable Y to one or more independent variables X. The simplest regression models capture a linear relationship between the expected value of Y and each independent variable (when the other independent variables are held fixed). If linearity fails to hold, even approximately, it is sometimes possible to transform either the independent or dependent variables in the regression model to improve the linearity.

Another assumption of linear regression is that the variance be the same for each possible expected value (this is known as homoskedasticity). Univariate normality is not needed for least squares estimates of the regression parameters to be meaningful (see Gauss-Markov theorem). However confidence intervals and hypothesis tests will have better statistical properties if the variables exhibit multivariate normality. This can be assessed empirically by plotting the fitted values against the residuals, and by inspecting the normal quantile plot of the residuals. Note that it is not relevant whether the dependent variable Y is marginally normally distributed.

Read more about this topic: Data Transformation (statistics)

Famous quotes containing the word data:

“Mental health data from the 1950’s on middle-aged women showed them to be a particularly distressed group, vulnerable to depression and feelings of uselessness. This isn’t surprising. If society tells you that your main role is to be attractive to men and you are getting crow’s feet, and to be a mother to children and yours are leaving home, no wonder you are distressed.”
—Grace Baruch (20th century)