I've done some research and I strive to figure out which is the typical use case for Hadoop. What I've understood so far is that it should be the best approach for a batch processing, when data size is in the order of terabytes at least, sources are also unstructured and the algorithm is sequential, like counting the occurrence of words in many documents... At high level my understanding is that the key point is to move the code toward data nodes instead of the opposite, traditional approach.
But
1) I still fail to see - in a simple manner - why other classical parallel programming implementation should not reach similar performance and
2) I wonder whether Hadoop map reduce paradigm could be applicable to use cases, in which the data size is smaller (even though the sources are also unstructured) or what would be the more appropriate technology in that case?