Hive Bucketed Map Join

Question

I am facing issue in executing bucketed map join.

I am using hive 0.10.

Table1 is a partitioned table on year,month and day. Each partition data is bucketed by column c1 into 128 buckets. I have almost 100 million records per day.

Table 1 
create table1
(
....
....
)
partitioned by (year int,month int,day int)
CLUSTERED BY(c1) INTO 128 BUCKETS;

Table2 is a large lookup table bucketed on column c1. I have 80 million records loaded into 128 buckets.

Table 2
create table2
(
 c1
 c2
 ...
)
CLUSTERED BY(c1) INTO 128 BUCKETS;

I have checked the data and it's loaded as per expectation into buckets.

Now, I am trying to enforce bucketed map join.That's where I am stuck.

set hive.auto.convert.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.mapjoin.bucket.cache.size=1000000;

select a.c1 as c1_tb2,a.c2
       b.c1,b....
from table2 a
JOIN table1 b
ON (a.c1=b.c1);

I am still not getting bucketed map join. Am I missing something? Even I tried to execute join on only 1 partition. But, still I am getting same result.

Or

Bucketed map join doesn't work partition tables?

Please help.Thanks.

Have you set hive.enforce.bucketing=true before loading the data? Also, since the number of buckets are same, I think you'll be better off using sort-merge join. — visakh
I have set hive.enforce.bucketing=true parameter. I tried using sort-merge join. But, I am not sure how to know if sort-merge join is taking place? — jigarshah
to make sure what's going on try using the explain command on your query — fd8s0

Siva Narayanan Siva Narayanan · Accepted Answer · 2015-06-02T07:30:00

This explanation is for Hive 0.13. AFAICT, bucketed map join doesn't take effect for auto converted map joins. You will need to explicitly call out map join in the syntax like this:

set hive.optimize.bucketmapjoin = true;                                                   
explain extended select /* +MAPJOIN(b) */ count(*) 
from nation_b1 a 
join nation_b2 b on (a.n_regionkey = b.n_regionkey);

Note that only explain extended shows you the flag that indicates if bucket map join is being used or not. Look for this line in the plan.

BucketMapJoin: true

Hive Bucketed Map Join

2 Answers