1
votes

I am facing a hive problem.

  1. I will get a 0 or 1 after from sql

"select count(*) from table1 where ..."

  1. If the result is 1, then I will execute the sql

"Insert Into table2 partition(d) (select xxxx from table 1 where ... group by t)"

Otherwise do nothing.

My question is how can I write these two sql together into one sql. I am only allowed to write a single long sql.

I tried to put the first sql into the where condition in sql2, but it throwed an error said it's not supported to operat on table1 in the subquery (couldn't remember clearly, something like this).

It sounds like a very easy question for experienced programmers, but I just started lerning hive for 2 days.

1

1 Answers

0
votes

If select in insert overwrite table partition does not returns rows, nothing is being overwritten.

So, just calculate your count in the same dataset and filter by it, use analytics funtion if you want to aggregate data on different level before insert

Insert Into table2 partition(d) 
select col1, col2, sum(col3), etc, etc, partition_col
from
(
select --some columns here,
       --Assign the same count to all rows
       count(case when your_boolean_condition then 1 else null end) over () as cnt
  from table 1
) s 
 where cnt=1 --If this is not satisfied, no overwrite will happen 
   AND more_conditions 
group by ...

Another approach possible is to use cross-join with your count:

insert Into table2 partition(d) 
select xxxx ... ... ..., partition_column as d 
from 
(
select t.*, c.cnt
table1 t cross join (select count(*) cnt from table1 where condition) c
)s
where cnt=1 <and another_condition>