The Process of Data Cleansing
- Data auditing: The data is audited with the use of statistical and database methods to detect anomalies and contradictions: this eventually gives an indication of the characteristics of the anomalies and their locations. Several commercial software packages will let you specify constraints of various kinds (using a grammar that conforms to that of a standard programming language, e.g., JavaScript of Visual Basic) and then generate code that checks the data for violation of these constraints. This process is referred to below in the bullets "workflow specification" and "workflow execution." For users who lack access to high-end cleansing software, Microcomputer database packages such as Microsoft Access or FileMaker Pro will also let you perform such checks, on a constraint-by-constraint basis, interactively with little or no programming required in many cases.
- Workflow specification: The detection and removal of anomalies is performed by a sequence of operations on the data known as the workflow. It is specified after the process of auditing the data and is crucial in achieving the end product of high-quality data. In order to achieve a proper workflow, the causes of the anomalies and errors in the data have to be closely considered.
- Workflow execution: In this stage, the workflow is executed after its specification is complete and its correctness is verified. The implementation of the workflow should be efficient, even on large sets of data, which inevitably poses a trade-off because the execution of a data-cleansing operation can be computationally expensive.
- Post-processing and controlling: After executing the cleansing workflow, the results are inspected to verify correctness. Data that could not be corrected during execution of the workflow is manually corrected, if possible. The result is a new cycle in the data-cleansing process where the data is audited again to allow the specification of an additional workflow to further cleanse the data by automatic processing.
Read more about this topic: Data Cleansing
Famous quotes containing the words process, data and/or cleansing:
“Im not suggesting that all men are beautiful, vulnerable boys, but we all started out that way. What happened to us? How did we become monsters of feminist nightmares? The answer, of course, is that we underwent a careful and deliberate process of gender training, sometimes brutal, always dehumanizing, cutting away large chunks of ourselves. Little girls went through something similarly crippling. If the gender training was successful, we each ended up being half a person.”
—Frank Pittman (20th century)
“This city is neither a jungle nor the moon.... In long shot: a cosmic smudge, a conglomerate of bleeding energies. Close up, it is a fairly legible printed circuit, a transistorized labyrinth of beastly tracks, a data bank for asthmatic voice-prints.”
—Susan Sontag (b. 1933)
“For even satire is a form of sympathy. It is the way our sympathy flows and recoils that really determines our lives. And here lies the vast importance of the novel, properly handled. It can inform and lead into new places our sympathy away in recoil from things gone dead. Therefore the novel, properly handled, can reveal the most secret places of life: for it is the passional secret places of life, above all, that the tide of sensitive awareness needs to ebb and flow, cleansing and freshening.”
—D.H. (David Herbert)