MapReduce Cloud Computing Question:

How does fault tolerance work in mapreduce?

Cloud Computing - MapReduce Interview Question

Answer:

In a mapreduce job the master pings each worker periodically. In case a worker does not respond to that system then the system is marked as failed. Even completed tasks are rescheduled because the output was stored in a in a local disk of a worker which failed. Hence mapreduce is able to handle large-scale failures easily by simply restarting a task. The master node always saves itself at checkpoints and in case of any failure it simply restarts from that checkpoint.

Previous Question	Next Question
Do you know how is MapReduce related to cloud computing?	Can you please explain in MapReduce what is a scarce system resource?