6
votes

My hive table is partitioned on year, month, day, Hour

Now I want to fetch data from 2014-05-27 to 2014-06-05 How can I do that??

I know one option is create partition on epoch(or yyyy-mm-dd-hh) and in query pass epoch time. Can I do it without loosing date hierarchy??

Table Structure

CREATE TABLE IF NOT EXISTS table1 (col1 int, col2 int)
PARTITIONED BY (year int, month int, day int, hour int) 
STORED AS TEXTFILE;
3
Could you paste a sample of data which have in your hive table? - Lalit Agarwal
@LalitAgarwal I have added table structure in questions. Thanks for pointing it. - banjara

3 Answers

10
votes

This is a similar scenario we face everyday while querying tables in hive. We have partitioned our tables similar to the way you explained and it has helped a lot if querying. This is how we partition:

CREATE TABLE IF NOT EXISTS table1 (col1 int, col2 int)
PARTITIONED BY (year bigint, month bigint, day bigint, hour int) 
STORED AS TEXTFILE;

For partitions we assign values like this:

year = 2014, month = 201409, day = 20140924, hour = 01

This way the querying becomes really simple and you can directly query:

select * from table1 where day >= 20140527 and day < 20140605 

Hope this helps

3
votes

you can query like this

  WHERE st_date > '2014-05-27-00' and end_date < '2014-06-05-24' 

should give you desired result because even if it is a sting a it will be compared lexicographically i.e '2014-04-04' will be always greater '2014-04-03'.

I ran it on my sample tables and it works perfectly fine.

1
votes

You can use CONCAT with LPAD.

Say you want to get all partitions between 2020-03-24, hour=00 to 2020-04-24, hour=23, then, your 'where' condition would look like:

WHERE (CONCAT(year, '-', LPAD(month,2,'0'), '-', LPAD(day,2,'0'), '_', LPAD(hour,2,'0')) > '2020-03-24_00')
AND (CONCAT(year, '-', LPAD(month,2,'0'), '-', LPAD(day,2,'0'), '_', LPAD(hour,2,'0')) < '2020-04-24_23')