0
votes

According to Apache Hive Wiki (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), multi inserts "minimize the number of data scans required. Hive can insert data into multiple tables by scanning the input data just once (and applying different query operators) to the input data".

But, is it possible to apply joins to these multiple inserts?

I'm asking because I cannot apply the join in the common FROM statement, because I need to subset the data of the second table taking into account different parameters for each "first" table.

Thanks in advance!

2

2 Answers

1
votes

Yes you can apply join condition as well. Below is the query format :

from emp a,emp b
insert into table manager select distinct a.manager_id, a.name where a.id=b.manager_id 
insert into table dept select distinct a.dept_id, a.project_id;

Above query will apply self join on emp table and will extract manager_id and name and then will insert into the manager table. Also it will insert project_id and dept_id in the dept table.

In simple words using inner join below query will also do the same task as above :

from emp a inner join emp b 
insert into table manager select distinct a.manager_id, a.name where a.id=b.manager_id 
insert into table dept select distinct a.dept_id, a.project_id;
0
votes

It should be possible like this.

from (
select a.*,b* from a inner join b on a.id=b.id2
)
insert ..
insert ..