0
votes

I have a 456kb file which is being read from hdfs and its given as input to mapper function. Every line contain a integer for which I am downloading some files and storing them on local system. I have hadoop set up on two-node cluster and the split size is changed from the program to open 8-mappers :

    Configuration configuration = new Configuration();

    configuration.setLong("mapred.max.split.size", 60000L);
    configuration.setLong("mapred.min.split.size", 60000L);

8 mappers are created but same data is downloaded on both the servers, I think its happening because block size is still set to default 256mb and input file is processed twice. So my question is can we process a small size file with map reduce?

1
The framework doesn't stop you from processing small files. But, I didn't quite get what exactly you are trying to achieve.Tariq
I want this single file of 456kb to be processed by a number of mappers, instead of splitting the file the entire file is being processed twice on each server. So I am getting same out put on both servers which should not happen.mumbai
Are you using a custom InputFormat/RecordReader?Mike Park
@climbage No Input is just a txt file.mumbai
So what InputFormat are you using? TextInputFormat?Mike Park

1 Answers

1
votes

If your download of files take time, you might have suffered from what's called speculative execution of Hadoop, which is by default enabled. It's just a guess though, since, you said you are getting same files downloaded more than once.

With speculative execution turn on the same input can be processed multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform.

You can disable speculative execution for the mappers and reducers by setting the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options to false, respectively.