1
votes

Suppose my mappers output N keys (these keys are different), and I have K reducers. How to write custom Paritioner so that each reducer receive approximately N/K keys? Which keys going to which receives is not important.

Example: Suppose my mappers output 10 pairs <k1,v1>,<k2,v2>,<k3,v3>,...<k10,v10>, and I have 3 reducers. I want 3 pairs going to 1st Reducer, 3 pairs going to 2nd, 4 pairs going to 3rd, no matter which keys going to which reducers.

What I attempted:

  • Randomly assign reducer. E.g., randomly assign <k1,v1> to 1st reducer, <k2,v2> to 2st reducer, and so on. But still there are reducers get much more data than others
  • I do not want to fix which keys going to which reducers. Because the keys k1,k2,...k10 of my mappers changes according to input data --> I have to change code for each input data. Moreover, these keys have equal roles. I just need to distribute them equally between reducers.

Thanks a lot.

1
Use logics like partition using alphabetical order etc.Gaurav Varshney

1 Answers

0
votes

Default partitioner uses hash function, it gives even distribution by design, so you won't get any better results unless you know something about the data, e.g. exact values of keys that should be distributed.