Content-addressable Storage - Typical Implementation

Typical Implementation

Paul Carpentier and Jan van Riel coined the term CAS while working at a company called FilePool in the late 1990s. FilePool was acquired in 2001 and became the underpinnings of the first commercially available CAS system, which was introduced as EMC's Centera platform. Paul and Jan are now working together again at Caringo which has introduced advancements in CAS technology with the CAStor content storage software. The Centera CAS system consists of a series of networked nodes (1-U servers running Linux), divided between storage nodes and access nodes. The access nodes maintain a synchronized directory of content addresses, and the corresponding storage node where each address can be found. When a new data element, or blob (Binary large object), is added, the device calculates a hash of the content and returns this hash as the blob's content address. As mentioned above, the hash is searched for to verify that identical content is not already present. If the content already exists, the device does not need to perform any additional steps; the content address already points to the proper content. Otherwise, the data is passed off to a storage node and written to the physical media.

When a content address is provided to the device, it first queries the directory for the physical location of the specified content address. The information is then retrieved from a storage node, and the actual hash of the data recomputed and verified. Once this is complete, the device can supply the requested data to the client. Within the Centera system, each content address actually represents a number of distinct data blobs, as well as optional metadata. Whenever a client adds an additional blob to an existing content block, the system recomputes the content address.

To provide additional data security, the Centera access nodes, when no read or write operation is in progress, constantly communicate with the storage nodes, checking the presence of at least two copies of each blob as well as their integrity. Additionally, they can be configured to exchange data with a different, e.g. off-site, Centera system, thereby strengthening the precautions against accidental data loss.

IBM has another flavor of CAS which can be software based, Tivoli Storage manager 5.3, or hardware based, the IBM DR550. The architecture is different in that it is based on a hierarchical storage management (HSM) design which provides some additional flexibility such as being able to support not only WORM disk but WORM tape and the migration of data from WORM disk to WORM tape and vice versa. This provides for additional flexibility in disaster recovery situations as well as the ability to reduce storage costs by moving data off disk to tape.

Another typical implementation is from iTernity. The concept of iTernity bases of container, each container is addressed by its hash value. A container is a multiple number of fixed content documents, so one container is not changeable and the hash value is fixed after the write process.

Read more about this topic:  Content-addressable Storage

Famous quotes containing the word typical:

    Compare the history of the novel to that of rock ‘n’ roll. Both started out a minority taste, became a mass taste, and then splintered into several subgenres. Both have been the typical cultural expressions of classes and epochs. Both started out aggressively fighting for their share of attention, novels attacking the drama, the tract, and the poem, rock attacking jazz and pop and rolling over classical music.
    W. T. Lhamon, U.S. educator, critic. “Material Differences,” Deliberate Speed: The Origins of a Cultural Style in the American 1950s, Smithsonian (1990)