Data Cleansing - Popular Methods Used

Popular Methods Used

  • Parsing: Parsing in data cleansing is performed for the detection of syntax errors. A parser decides whether a string of data is acceptable within the allowed data specification. This is similar to the way a parser works with grammars and languages.
  • Data transformation: Data transformation allows the mapping of the data from its given format into the format expected by the appropriate application. This includes value conversions or translation functions, as well as normalizing numeric values to conform to minimum and maximum values.
  • Duplicate elimination: Duplicate detection requires an algorithm for determining whether data contains duplicate representations of the same entity. Usually, data is sorted by a key that would bring duplicate entries closer together for faster identification.
  • Statistical methods: By analyzing the data using the values of mean, standard deviation, range, or clustering algorithms, it is possible for an expert to find values that are unexpected and thus erroneous. Although the correction of such data is difficult since the true value is not known, it can be resolved by setting the values to an average or other statistical value. Statistical methods can also be used to handle missing values which can be replaced by one or more plausible values, which are usually obtained by extensive data augmentation algorithms.

Read more about this topic:  Data Cleansing

Famous quotes containing the words popular and/or methods:

    Heroes are created by popular demand, sometimes out of the scantiest materials, or none at all.
    Gerald W. Johnson (1890–1980)

    Cold and hunger seem more friendly to my nature than those methods which men have adopted and advise to ward them off.
    Henry David Thoreau (1817–1862)