0
votes

I have some doubts about skew join in hive .

1.when will hive use a common join to process the data , because I only see map join after I set blow properties

  • set hive.optimize.skewjoin=true;
  • set hive.mapjoin.smalltable.filesize=2;

2.why dosn`t skew join work with left join

below is table and sql:

tmp.skew_large_table 字段 imei,imsi,mac,phone,data_date;
    total rows:290,0808
    skew key : 868407035454956 670081
-----------
tmp.test_skew_small_table  字段  imei,package,data_date
    total rows:857,6164
    skew key : 868407035454956  10461
-----------

sql:
select a.*,b.*
    from tmp.skew_large_table a
    join
    tmp.test_skew_small_table b
    on a.imei=b.imei;
1
Why are you setting set hive.mapjoin.smalltable.filesize=2; ? Is it an attempt to disable map-join?leftjoin
yes, I want meet a common join .Jax Ma
Then set hive.auto.convert.join=false; is more obvious way to switch-off map-joinleftjoin
It still process skewKey with map join .Jax Ma

1 Answers

0
votes

After reading source code of hive . I got answers

Q1:

hive.mapjoin.smalltable.filesize and hive.auto.convert.join dosn`t work for skew join

For every skey join , hive will use map-joins to handle it.

Q2

Outer-join will not trigger skew-join ,source code shows blow

// We are trying to adding map joins to handle skew keys, and map join righ
// now does not work with outer joins  
    if (!GenMRSkewJoinProcessor.skewJoinEnabled(parseCtx.getConf(), joinOp)) {
      return;
    }