Map Reduce - Distribution and Reliability

Distribution and Reliability

MapReduce achieves reliability by parceling out a number of operations on the set of data to each node in the network. Each node is expected to report back periodically with completed work and status updates. If a node falls silent for longer than that interval, the master node (similar to the master server in the Google File System) records the node as dead and sends out the node's assigned work to other nodes. Individual operations use atomic operations for naming file outputs as a check to ensure that there are not parallel conflicting threads running. When files are renamed, it is possible to also copy them to another name in addition to the name of the task (allowing for side-effects).

The reduce operations operate much the same way. Because of their inferior properties with regard to parallel operations, the master node attempts to schedule reduce operations on the same node, or in the same rack as the node holding the data being operated on. This property is desirable as it conserves bandwidth across the backbone network of the datacenter.

Implementations are not necessarily highly reliable. For example, in Hadoop the NameNode is a single point of failure for the distributed filesystem.

Read more about this topic:  Map Reduce

Famous quotes containing the word distribution:

    There is the illusion of time, which is very deep; who has disposed of it? Mor come to the conviction that what seems the succession of thought is only the distribution of wholes into causal series.
    Ralph Waldo Emerson (1803–1882)