Robust Statistics - Replacing Outliers and Missing Values

Replacing Outliers and Missing Values

If there are relatively few missing points, there are some models which can be used to estimate values to complete the series, such as replacing missing values with the mean or median of the data. Simple linear regression can also be used to estimate missing values (MacDonald and Zucchini, 1997; Harvey, 1989). In addition, outliers can sometimes be accommodated in the data through the use of trimmed means, other scale estimators apart from standard deviation (e.g. MAD) and Winsorization (McBean and Rovers, 1998). In calculations of a trimmed mean, a fixed percentage of data is dropped from each end of an ordered data, thus eliminating the outliers. The mean is then calculated using the remaining data. Winsorizing involves accommodating an outlier by replacing it with the next highest or next smallest value as appropriate (Rustum & Adeloye, 2007).

However, using these types of models to predict missing values or outliers in a long time series is difficult and often unreliable, particularly if the number of values to be in-filled is relatively high in comparison with total record length. The accuracy of the estimate depends on how good and representative the model is and how long the period of missing values extends (Rosen and Lennox, 2001). The in a case of a dynamic process, so any variable is dependent, not just on the historical time series of the same variable but also on several other variables or parameters of the process. In other words, the problem is an exercise in multivariate analysis rather than the univariate approach of most of the traditional methods of estimating missing values and outliers; a multivariate model will therefore be more representative than a univariate one for predicting missing values. The kohonin self organising map (KSOM) offers a simple and robust multivariate model for data analysis, thus providing good possibilities to estimate missing values, taking into account its relationship or correlation with other pertinent variables in the data record (Rustum & Adeloye 2007).

Standard Kalman filters are not robust to outliers. To this end Ting, Theodorou and Schaal have recently shown that a modification of Masreliez's theorem can deal with outliers.

One common approach to handle outliers in data analysis is to perform outlier detection first, followed by an efficient estimation method (e.g., the least squares). While this approach is often useful, one must keep in mind two challenges. First, an outlier deletion method that relies on a non-robust initial fit can suffer from the effect of masking, that is, a group of outliers can mask each other and escape detection (Rousseeuw and Leroy, 2007). Second, if a high breakdown initial fit it used for outlier detection, the follow-up analysis might inherit some of the inefficiencies of the initial estimator (He and Portnoy, 1992).

Read more about this topic:  Robust Statistics

Famous quotes containing the words replacing, missing and/or values:

    I do not mean to imply that the good old days were perfect. But the institutions and structure—the web—of society needed reform, not demolition. To have cut the institutional and community strands without replacing them with new ones proved to be a form of abuse to one generation and to the next. For so many Americans, the tragedy was not in dreaming that life could be better; the tragedy was that the dreaming ended.
    Richard Louv (20th century)

    If we notice a few errors in the work of a proven master, we may and even will often be correct; if we believe, however, that he is completely and utterly mistaken, we are in danger of missing his entire concept.
    Franz Grillparzer (1791–1872)

    Our first line of defense in raising children with values is modeling good behavior ourselves. This is critical. How will our kids learn tolerance for others if our hearts are filled with hate? Learn compassion if we are indifferent? Perceive academics as important if soccer practice is a higher priority than homework?
    Fred G. Gosman (20th century)