Distribution and Reliability
MapReduce achieves reliability by parceling out a number of operations on the set of data to each node in the network. Each node is expected to report back periodically with completed work and status updates. If a node falls silent for longer than that interval, the master node (similar to the master server in the Google File System) records the node as dead and sends out the node's assigned work to other nodes. Individual operations use atomic operations for naming file outputs as a check to ensure that there are not parallel conflicting threads running. When files are renamed, it is possible to also copy them to another name in addition to the name of the task (allowing for side-effects).
The reduce operations operate much the same way. Because of their inferior properties with regard to parallel operations, the master node attempts to schedule reduce operations on the same node, or in the same rack as the node holding the data being operated on. This property is desirable as it conserves bandwidth across the backbone network of the datacenter.
Implementations are not necessarily highly reliable. For example, in Hadoop the NameNode is a single point of failure for the distributed filesystem.
Read more about this topic: Map Reduce
Famous quotes containing the word distribution:
“The man who pretends that the distribution of income in this country reflects the distribution of ability or character is an ignoramus. The man who says that it could by any possible political device be made to do so is an unpractical visionary. But the man who says that it ought to do so is something worse than an ignoramous and more disastrous than a visionary: he is, in the profoundest Scriptural sense of the word, a fool.”
—George Bernard Shaw (18561950)