How do mapper/reducer instances get re-used within a jvm that's kept alive perpetually?
For example, let's say I wanted to do something like this:
public class MyMapper extends MapReduceBase implements Mapper<K1, V1, K2, V2> {
private Set<String> set = new HashSet<String>();
public void map(K1 k1, V1 v1, OutputCollector<K2, V2> output, Reporter reporter) {
... do stuff ...
set.add(k1.toString()); //add something to a list so that it can be used later
... do other stuff ...
if(set.contains("someString"))
emitSomeKindOfOutput(output);
else
emitSomeOtherKindOfOutput(output);
}
}
If the same mapper can be used for multiple tasks/jobs, then the member set could cause problems because it would still contain other junk from previous tasks/jobs. Is this kind of re-use possible in hadoop? What about for reducers?