0
votes

I have a map-reduce program in which the mappers generate multiple keys. According to the map-reduce framework all pairs having the same key are transferred to the same reducer. Let's say I have 10 keys (in total) and 3 reducers. What the reducers output at the end is 3 output files. Is there any technique to generate a separate output file for each key and output 10 output file at the end? It is possible to have 10 reducers, but when the number of keys increases this solution may not be possible.

2

2 Answers

0
votes

That doesn't sound like a very good idea. You'll face severe issues once you start using Hadoop for real stuff.

But if you still need it, why don't you skip the Reduce phase. Just emit the output directly from Mappers followed by a Combiner.

0
votes

If you are okay with using the old mapred API then there is an alternative to do this:

You may go for an overridden extension of MultipleTextOutputFormat and then make all the contents of the record to be the part of 'value', while make the file-name or path to be the key.

There is already an implementation provided in the oddjob library : MultipleLeafValueOutputFormat, you can implement yourself too.

Read more about it here. Also read my similar answer here.