Local Regression - Localized Subsets of Data

Localized Subsets of Data

The subsets of data used for each weighted least squares fit in LOESS are determined by a nearest neighbors algorithm. A user-specified input to the procedure called the "bandwidth" or "smoothing parameter" determines how much of the data is used to fit each local polynomial. The smoothing parameter, is a number between and 1, with denoting the degree of the local polynomial. The value of is the proportion of data used in each fit. The subset of data used in each weighted least squares fit comprises the points (rounded to the next largest integer) whose explanatory variables values are closest to the point at which the response is being estimated.

is called the smoothing parameter because it controls the flexibility of the LOESS regression function. Large values of produce the smoothest functions that wiggle the least in response to fluctuations in the data. The smaller is, the closer the regression function will conform to the data. Using too small a value of the smoothing parameter is not desirable, however, since the regression function will eventually start to capture the random error in the data. Useful values of the smoothing parameter typically lie in the range 0.25 to 0.5 for most LOESS applications.

Read more about this topic:  Local Regression

Famous quotes containing the word data:

    Mental health data from the 1950’s on middle-aged women showed them to be a particularly distressed group, vulnerable to depression and feelings of uselessness. This isn’t surprising. If society tells you that your main role is to be attractive to men and you are getting crow’s feet, and to be a mother to children and yours are leaving home, no wonder you are distressed.
    Grace Baruch (20th century)