I am trying to run a hadoop job on a very large amount of data, using up to 32 reducers. But when I look in the output for each reducer I see that it may happen that more than one reducer gets a key (of course with different values). Can this behavior be avoided while using more reducers?
LE: I've tried and used the Text class instead, but the problem is that though it works fine, my jvm eventually crashes due to running low on heap space. What are the criteria hadoop uses for partitioning data into key pools apart from the compareTo ?