Hive memory setting for local task during map join

Question

I'm using a hdinsight cluster (hive version .13) to run some hive queries. One of the queries (query 7 from the TPCH suit) which launches a local task during map join fails due to insufficient memory (hive aborts it because the hashtable has reached the configured limit).

Hive seems to be allocating 1GB to the local task, from where is this size picked up and how can I increase it?

2015-05-03 05:38:19        Starting to launch local task to process map join;               maximum memory = 932184064

I assumed the local task should use the same heap size of the mapper, but it does not seem to be the case. Any help is appreciated.

ShirishT ShirishT · Accepted Answer · 2016-02-10T20:07:14

Quite late on this thread.. but just for others who face the same issue.

The documentation does state that the local (child) JVM will have same size as that of map (https://cwiki.apache.org/confluence/display/Hive/MapJoinOptimization), it does not seem to be the case. Instead, the JVM size is governed by HADOOP_HEAPSIZE setting from hive-env.sh. So, in the case of original post from Shradha, I suspect the HADOOP_HEAPSIZE is set to 1GB.

Hive memory setting for local task during map join

3 Answers