Map Reduce - Distribution and Reliability

Distribution and Reliability

MapReduce achieves reliability by parceling out a number of operations on the set of data to each node in the network. Each node is expected to report back periodically with completed work and status updates. If a node falls silent for longer than that interval, the master node (similar to the master server in the Google File System) records the node as dead and sends out the node's assigned work to other nodes. Individual operations use atomic operations for naming file outputs as a check to ensure that there are not parallel conflicting threads running. When files are renamed, it is possible to also copy them to another name in addition to the name of the task (allowing for side-effects).

The reduce operations operate much the same way. Because of their inferior properties with regard to parallel operations, the master node attempts to schedule reduce operations on the same node, or in the same rack as the node holding the data being operated on. This property is desirable as it conserves bandwidth across the backbone network of the datacenter.

Implementations are not necessarily highly reliable. For example, in Hadoop the NameNode is a single point of failure for the distributed filesystem.

Famous quotes containing the word distribution:

“The question for the country now is how to secure a more equal distribution of property among the people. There can be no republican institutions with vast masses of property permanently in a few hands, and large masses of voters without property.... Let no man get by inheritance, or by will, more than will produce at four per cent interest an income ... of fifteen thousand dollars] per year, or an estate of five hundred thousand dollars.”
—Rutherford Birchard Hayes (1822–1893)