Data Scraping - Description

Description

Normally, data transfer between programs is accomplished using data structures suited for automated processing by computers, not people. Such interchange formats and protocols are typically rigidly structured, well-documented, easily parsed, and keep ambiguity to a minimum. Very often, these transmissions are not human-readable at all.

Thus, the key element that distinguishes data scraping from regular parsing is that the output being scraped was intended for display to an end-user, rather than as input to another program, and is therefore usually neither documented nor structured for convenient parsing. Data scraping often involves ignoring binary data (usually images or multimedia data), display formatting, redundant labels, superfluous commentary, and other information which is either irrelevant or hinders automated processing.

Data scraping is most often done to either (1) interface to a legacy system which has no other mechanism which is compatible with current hardware, or (2) interface to a third-party system which does not provide a more convenient API. In the second case, the operator of the third-party system may even see screen scraping as unwanted, due to reasons such as increased system load, the loss of advertisement revenue, or the loss of control of the information content.

Data scraping is generally considered an ad hoc, inelegant technique, often used only as a "last resort" when no other mechanism is available. Aside from the higher programming and processing overhead, output displays intended for human consumption often change structure frequently. Humans can cope with this easily, but computer programs will often crash or produce incorrect results.

Read more about this topic:  Data Scraping

Famous quotes containing the word description:

    The great object in life is Sensation—to feel that we exist, even though in pain; it is this “craving void” which drives us to gaming, to battle, to travel, to intemperate but keenly felt pursuits of every description whose principal attraction is the agitation inseparable from their accomplishment.
    George Gordon Noel Byron (1788–1824)

    I was here first introduced to Joe.... He was a good-looking Indian, twenty-four years old, apparently of unmixed blood, short and stout, with a broad face and reddish complexion, and eyes, methinks, narrower and more turned up at the outer corners than ours, answering to the description of his race. Besides his underclothing, he wore a red flannel shirt, woolen pants, and a black Kossuth hat, the ordinary dress of the lumberman, and, to a considerable extent, of the Penobscot Indian.
    Henry David Thoreau (1817–1862)

    Whose are the truly labored sentences? From the weak and flimsy periods of the politician and literary man, we are glad to turn even to the description of work, the simple record of the month’s labor in the farmer’s almanac, to restore our tone and spirits.
    Henry David Thoreau (1817–1862)