2
votes

I'm using a hdinsight cluster (hive version .13) to run some hive queries. One of the queries (query 7 from the TPCH suit) which launches a local task during map join fails due to insufficient memory (hive aborts it because the hashtable has reached the configured limit).

Hive seems to be allocating 1GB to the local task, from where is this size picked up and how can I increase it?

2015-05-03 05:38:19        Starting to launch local task to process map join;               maximum memory = 932184064

I assumed the local task should use the same heap size of the mapper, but it does not seem to be the case. Any help is appreciated.

3

3 Answers

1
votes

Quite late on this thread.. but just for others who face the same issue.

The documentation does state that the local (child) JVM will have same size as that of map (https://cwiki.apache.org/confluence/display/Hive/MapJoinOptimization), it does not seem to be the case. Instead, the JVM size is governed by HADOOP_HEAPSIZE setting from hive-env.sh. So, in the case of original post from Shradha, I suspect the HADOOP_HEAPSIZE is set to 1GB.

0
votes

This property controls it :

yarn.app.mapreduce.am.command-opts

This is the Application Manager jvm opts. Since local task runs on AM.

Can you also try this property :

set hive.mapjoin.localtask.max.memory.usage = 0.999;

0
votes

You can use HADOOP_HEAPSIZE=512 or HADOOP_CLIENT_OPTS=-Xmx512m which can both be tweaked from hadoop-env.sh.

Note however that this might lead to unexpected behaviors for some jobs and you will probably have to play with

mapreduce.map.memory.mb and mapreduce.map.java.opts

as well as

mapreduce.reduce.memory.mb and mapreduce.reduce.java.opts in the mapred-site config file in order to make sure that jobs remain stable.