Backup - Manipulation of Data and Dataset Optimization

Manipulation of Data and Dataset Optimization

It is frequently useful or required to manipulate the data being backed up to optimize the backup process. These manipulations can provide many benefits including improved backup speed, restore speed, data security, media usage and/or reduced bandwidth requirements.

Compression
Various schemes can be employed to shrink the size of the source data to be stored so that it uses less storage space. Compression is frequently a built-in feature of tape drive hardware.
Deduplication
When multiple similar systems are backed up to the same destination storage device, there exists the potential for much redundancy within the backed up data. For example, if 20 Windows workstations were backed up to the same data repository, they might share a common set of system files. The data repository only needs to store one copy of those files to be able to restore any one of those workstations. This technique can be applied at the file level or even on raw blocks of data, potentially resulting in a massive reduction in required storage space. Deduplication can occur on a server before any data moves to backup media, sometimes referred to as source/client side deduplication. This approach also reduces bandwidth required to send backup data to its target media. The process can also occur at the target storage device, sometimes referred to as inline or back-end deduplication.
Duplication
Sometimes backup jobs are duplicated to a second set of storage media. This can be done to rearrange the backup images to optimize restore speed or to have a second copy at a different location or on a different storage medium.
Encryption
High capacity removable storage media such as backup tapes present a data security risk if they are lost or stolen. Encrypting the data on these media can mitigate this problem, but presents new problems. Encryption is a CPU intensive process that can slow down backup speeds, and the security of the encrypted backups is only as effective as the security of the key management policy.
Multiplexing
When there are many more computers to be backed up than there are destination storage devices, the ability to use a single storage device with several simultaneous backups can be useful.
Refactoring
The process of rearranging the backup sets in a data repository is known as refactoring. For example, if a backup system uses a single tape each day to store the incremental backups for all the protected computers, restoring one of the computers could potentially require many tapes. Refactoring could be used to consolidate all the backups for a single computer onto a single tape. This is especially useful for backup systems that do incrementals forever style backups.
Staging
Sometimes backup jobs are copied to a staging disk before being copied to tape. This process is sometimes referred to as D2D2T, an acronym for Disk to Disk to Tape. This can be useful if there is a problem matching the speed of the final destination device with the source device as is frequently faced in network-based backup systems. It can also serve as a centralized location for applying other data manipulation techniques.

Read more about this topic:  Backup

Famous quotes containing the words manipulation of, manipulation and/or data:

    When we say “science” we can either mean any manipulation of the inventive and organizing power of the human intellect: or we can mean such an extremely different thing as the religion of science the vulgarized derivative from this pure activity manipulated by a sort of priestcraft into a great religious and political weapon.
    Wyndham Lewis (1882–1957)

    When we say “science” we can either mean any manipulation of the inventive and organizing power of the human intellect: or we can mean such an extremely different thing as the religion of science the vulgarized derivative from this pure activity manipulated by a sort of priestcraft into a great religious and political weapon.
    Wyndham Lewis (1882–1957)

    This city is neither a jungle nor the moon.... In long shot: a cosmic smudge, a conglomerate of bleeding energies. Close up, it is a fairly legible printed circuit, a transistorized labyrinth of beastly tracks, a data bank for asthmatic voice-prints.
    Susan Sontag (b. 1933)