In a Hadoop Reducer, I would like to create and emit new keys under specific conditions, and I'd like to ensure that these keys are unique.
The pseudo-code for what I want goes like:
@Override
protected void reduce(WritableComparable key, Iterable<Writable> values, Context context)
throws IOException, InterruptedException {
// do stuff:
// ...
// write original key:
context.write(key, data);
// write extra key:
if (someConditionIsMet) {
WritableComparable extraKey = createNewKey()
context.write(extraKey, moreData);
}
}
So I now have two questions:
- Is it possible at all to emit more than one different key in reduce()? I know that keys won't be resorted but that is ok for me.
The extra key has to be unique across all reducers - both for application reasons and because I think it would otherwise violate the contract of the reduce stage. What is a good way to generate a key that is unique across reducers (and possibly across jobs?)
Maybe get reducer/job IDs and incorporate that into key generation?