Hadoop On Amazon EC2/S3 Services
It is possible to run Hadoop on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3). As an example The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4 TB of raw image TIFF data (stored in S3) into 11 million finished PDFs in the space of 24 hours at a computation cost of about $240 (not including bandwidth).
There is support for the S3 filesystem in Hadoop distributions, and the Hadoop team generates EC2 machine images after every release. From a pure performance perspective, Hadoop on S3/EC2 is inefficient, as the S3 filesystem is remote and delays returning from every write operation until the data is guaranteed not to be lost. This removes the locality advantages of Hadoop, which schedules work near data to save on network load.
Read more about this topic: Apache Hadoop
Famous quotes containing the word services:
“We now in the United States have more security guards for the rich than we have police services for the poor districts. If youre looking for personal security, far better to move to the suburbs than to pay taxes in New York.”
—John Kenneth Galbraith (b. 1908)