0
votes

I'll try to explain my problem as clear as possible. I would like to filter a table by date (selecting only the record have the date included in current month) and in Oracle SQL I'm using the following query to achieve such goal:

select * from table t1 
where t1.DATE_COLUMN between TRUNC(SYSDATE, 'mm') and SYSDATE

How can I replicate the same filter in Hive SQL? The column I should use to apply the filter is a TIMESTAMP type column (e.g. 2017-05-15 00:00:00).

I'm using CDH 5.7.6-1.

Any advice?

2
Is DATE_COLUMN a partition column?David דודו Markovitz
@DuduMarkovitz no, it's not a partition column.user6385828

2 Answers

0
votes

Be aware that unix_timestamp is not fixed and is going to change during the query.
For that reason it cannot be used for partitions elimination.
For newer Hive versions use current_date / current_timestamp instead.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

select  * 
from    table t1 
where   t1.DATE_COLUMN  
          between  cast(from_unixtime(unix_timestamp(),'yyyy-MM-01 00:00:00') as timestamp)
          and      cast(from_unixtime(unix_timestamp()) as timestamp)
;

select  cast (from_unixtime(unix_timestamp(),'yyyy-MM-01 00:00:00') as timestamp)
       ,cast (from_unixtime(unix_timestamp()) as timestamp)
;

+---------------------+---------------------+
|         _c0         |         _c1         |
+---------------------+---------------------+
| 2017-05-01 00:00:00 | 2017-05-16 01:04:55 |
+---------------------+---------------------+
-1
votes

You can format as strings:

where date_format(t1.DATE_COLUMN, 'y-m') = date_format(current_timestamp, 'y-m')

I realize that I don't have Hive accessible right now. The documentation suggests 'y-m', but the Java documentation suggests 'yyyy-mm'.