0
votes

I have a Question regarding the MapReduce example explained here:

http://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/BigData_Analysis_-_Quick_Start_for_Programmers

It is indeed the most common example of hadoop MapReduce, the WordCount.

I am able to execute it with no problems at the global instance of Cosmos, but even when I give it an small input (a file with 2 or 3 lines) it takes a lot to execute it (half a minute more or less). I assume this is its normal behavior but my question is: ¿Why does it takes so long even for an small input?

I guess this method increases its efectiveness with bigger datasets where this minimal delay is negligible.

1

1 Answers

0
votes

First of all, you have to take into account the current instance of Cosmos at FIWARE LAB is a shared instance of Hadoop, thus many other user may be executing MapReduce jobs at the same time resulting in a "competition" for the computation resources.

Being said that, MapReduce is designed for large datasets and larga data files. It adds a lot of overhead that it's not necessary when processing a couple of lines (because for a couple of lines analsis you don't need MapReduce! :)) but which help a lot when those lines are thounsands, even millions. In those cases the processing time is proportional to the data size, of course, but not in a let's say 1:1 proportion.