2
votes

I am trying to reuse a static hash inside a setup method in a hadoop job.

    private static Map<Long,String> amostraTable = null; //class variable

    protected void setup(Context context) throws IOException,
            InterruptedException {

    if (amostraTable == null){
                amostraTable = new HashMap<Long,String>();
                System.out.println("Hashmap allocated!");
            } else{
                System.out.println("Hashmap reused");
                return ;
            }
}

I set mapreduce.job.jvm.numtasks=-1 I just want to reuse the hashmap. but every mapper are logging: Hashmap allocated!

Is there any other parameter to set? The mappers task are consuming to many CPU to allocate/populate the hashmap.

EDIT: Look at this post: http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201206.mbox/%3COFC497A21A.62B05EC6-ON85257A14.006F8FF6-85257A14.006FE8C7@freddiemac.com%3E

"If i understood correctly, then if I initialize a static variable (say var) in setup() and when mapper is started for the 2nd time on same JVM, the that var would be already initialized before setup() is called i.e it is retaining its value from previously run mapper. Is this the way ?"

EDIT mapred.job.reuse.jvm.num.tasks: JVM reuse no longer supported mapreduce.job.jvm.numtasks: JVM reuse no longer supported

EDIT Hadoop 2.x does not support JVM Reuse. So my first option is to use: MultiThreadedMapper. I will make my HashMap thread safe. Is this a good option?

1
Your code is not thread safe. Imagine that you have: 1. T1: if (amostraTable == null) evaluated to true 2. T2: if (amostraTable == null) evaluated to true 3. T1: a new instance is created 4. T2: another instance is created - Elrond_EGLDer

1 Answers

0
votes

I'm guessing you have multiple threads not seeing the variable update. Why don't you just declare it inline and make it final? Also a CHM might be more appropriate:

private static final Map<Long,String> amostraTable = new ConcurrentHashMap<>();