Data Cleansing - Popular Methods Used

Popular Methods Used

  • Parsing: Parsing in data cleansing is performed for the detection of syntax errors. A parser decides whether a string of data is acceptable within the allowed data specification. This is similar to the way a parser works with grammars and languages.
  • Data transformation: Data transformation allows the mapping of the data from its given format into the format expected by the appropriate application. This includes value conversions or translation functions, as well as normalizing numeric values to conform to minimum and maximum values.
  • Duplicate elimination: Duplicate detection requires an algorithm for determining whether data contains duplicate representations of the same entity. Usually, data is sorted by a key that would bring duplicate entries closer together for faster identification.
  • Statistical methods: By analyzing the data using the values of mean, standard deviation, range, or clustering algorithms, it is possible for an expert to find values that are unexpected and thus erroneous. Although the correction of such data is difficult since the true value is not known, it can be resolved by setting the values to an average or other statistical value. Statistical methods can also be used to handle missing values which can be replaced by one or more plausible values, which are usually obtained by extensive data augmentation algorithms.

Read more about this topic:  Data Cleansing

Famous quotes containing the words popular and/or methods:

    Party action should follow, not precede the creation of a dominant popular sentiment.
    J. Ellen Foster (1840–1910)

    A writer who writes, “I am alone” ... can be considered rather comical. It is comical for a man to recognize his solitude by addressing a reader and by using methods that prevent the individual from being alone. The word alone is just as general as the word bread. To pronounce it is to summon to oneself the presence of everything the word excludes.
    Maurice Blanchot (b. 1907)