Why does Hadoop MapReduce perform disk read/write in each iteration for iterative algorithms ？

Question

I know that for iterative algorithms，Hadoop mapreduce doesn't perform well since it does a complete disk read/write for each iteration. But why？Is that for the robustness of the system？

siddhartha jain siddhartha jain · Accepted Answer · 2017-02-16T16:01:34

Your question is a little bit broad but still, I would try to answer it.

Hadoop do the disk read/write operation for any algorithm is due to the fact that Hadoop does disk oriented processing and it was built on this principle. It is also one of the reasons why spark was developed, to move the computation from disk to memory so that it can decrease latency overhead of disk oriented computation.

Now this read/write operation from/to disk for each MapReduce iteration contribute to the system robustness and also reliability.Consider a simplest example that a worker node has 2 containers which mean two separate JVM will be running on the same machine and they will access the same data source available on that node. Thus if the Hadoop will not read/write on disk for each change then there is a possibility that second container when accessed the data it was not updated with changes and it can lead to corrupt and noisy output. That is one of the reasons Hadoop do read and write to disk for each iterative map reduce algorithm.

Hope this answer your query.

Why does Hadoop MapReduce perform disk read/write in each iteration for iterative algorithms ？

1 Answers