2
votes

I have two hive tables table1 and table2. There are no columns in common in both the tables.

table1:

------
| id |
------
| 22 |
| 11 |
| 56 |
| 15 |
------

table2:

-------------
| from | to |
-------------
|  10  | 20 |
|  21  | 35 |
|  40  | 60 |
-------------

The id column of table1 must be checked in table2 as to which range it belongs from columns 'from' and 'to' of table2.

Expected output:

------------------
| id | from | to |
------------------
| 22 |  21  | 35 |
| 11 |  10  | 20 |
| 56 |  40  | 60 |
| 15 |  10  | 20 |
------------------

Tried using cross join along with where condition and able to get desired output(But want to avoid cross joining) .Also tried using 'exists' command but facing error in getting output:

Query:

select id from table1 t1
where exists(select 1 from table2 t2
where t1.id between t2.from and t2.to) ;

But getting error as : subquery expression refers to both parent and subquery expressions and is not a valid join condition.

Any suggestions would be helpful at the earliest.

Thanks

2

2 Answers

2
votes

Here is step by step explanation to get your desired result:

hive> create table table1(id int);
hive> create table table2(from_ int, to_ int);
hive> select * from table1;
OK
22
11
56
15

hive> select * from table2;
OK
10  20
21  35
40  60

Your SQL should be look like as below to get the desired result:

select id,from_,to_
from table1 t1
left join table2 t2 on(t1.id between t2.from_ and t2.to_);

Output:
id      from_   to_
22      21      35
11      10      20
56      40      60
15      10      20
0
votes

Complex expressions in ON clause are supported, starting with Hive 2.2.0 (see HIVE-15211, HIVE-15251). Prior to that, Hive did not support join conditions that are not equality conditions.

So, the only solution in Hive < 2.2.0 seems to apply CROSS JOIN + filter