I am facing issue in executing bucketed map join.
I am using hive 0.10.
Table1 is a partitioned table on year,month and day. Each partition data is bucketed by column c1 into 128 buckets. I have almost 100 million records per day.
Table 1
create table1
(
....
....
)
partitioned by (year int,month int,day int)
CLUSTERED BY(c1) INTO 128 BUCKETS;
Table2 is a large lookup table bucketed on column c1. I have 80 million records loaded into 128 buckets.
Table 2
create table2
(
c1
c2
...
)
CLUSTERED BY(c1) INTO 128 BUCKETS;
I have checked the data and it's loaded as per expectation into buckets.
Now, I am trying to enforce bucketed map join.That's where I am stuck.
set hive.auto.convert.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.mapjoin.bucket.cache.size=1000000;
select a.c1 as c1_tb2,a.c2
b.c1,b....
from table2 a
JOIN table1 b
ON (a.c1=b.c1);
I am still not getting bucketed map join. Am I missing something? Even I tried to execute join on only 1 partition. But, still I am getting same result.
Or
Bucketed map join doesn't work partition tables?
Please help.Thanks.
hive.enforce.bucketing=truebefore loading the data? Also, since the number of buckets are same, I think you'll be better off usingsort-mergejoin. - visakh