Popular Methods Used
- Parsing: Parsing in data cleansing is performed for the detection of syntax errors. A parser decides whether a string of data is acceptable within the allowed data specification. This is similar to the way a parser works with grammars and languages.
- Data transformation: Data transformation allows the mapping of the data from its given format into the format expected by the appropriate application. This includes value conversions or translation functions, as well as normalizing numeric values to conform to minimum and maximum values.
- Duplicate elimination: Duplicate detection requires an algorithm for determining whether data contains duplicate representations of the same entity. Usually, data is sorted by a key that would bring duplicate entries closer together for faster identification.
- Statistical methods: By analyzing the data using the values of mean, standard deviation, range, or clustering algorithms, it is possible for an expert to find values that are unexpected and thus erroneous. Although the correction of such data is difficult since the true value is not known, it can be resolved by setting the values to an average or other statistical value. Statistical methods can also be used to handle missing values which can be replaced by one or more plausible values, which are usually obtained by extensive data augmentation algorithms.
Read more about this topic: Data Cleansing
Famous quotes containing the words popular and/or methods:
“If our entertainment culture seems debased and unsatisfying, the hope is that our children will create something of greater worth. But it is as if we expect them to create out of nothing, like God, for the encouragement of creativity is in the popular mind, opposed to instruction. There is little sense that creativity must grow out of tradition, even when it is critical of that tradition, and children are scarcely being given the materials on which their creativity could work”
—C. John Sommerville (20th century)
“Generalization, especially risky generalization, is one of the chief methods by which knowledge proceeds... Safe generalizations are usually rather boring. Delete that usually rather. Safe generalizations are quite boring.”
—Joseph Epstein (b. 1937)