I want to create a VIEW that always returns the last 1 hour of data from the 2 most recent Athena partitions.
I am using the following Amazon Athena DDL with a partition column called 'datehour' of type varchar.
CREATE EXTERNAL TABLE mydb.table_foo (
`account_id` string,
`account_email_address` string,
`record_timestamp` timestamp)
PARTITIONED BY (
`datehour` string)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://data-here-9191919191/table_foo'
TBLPROPERTIES (
'projection.datehour.format'='yyyy/MM/dd/HH',
'projection.datehour.interval'='1',
'projection.datehour.interval.unit'='HOURS',
'projection.datehour.range'='2020/11/01/00,NOW',
'projection.datehour.type'='date',
'projection.enabled'='true',
'storage.location.template'='s3://data-here-9191919191/table_foo/${datehour}',
'transient_lastDdlTime'='1604013447')
Here's the query I want to run to create a VIEW that always returns the last 1 hour of data from a the 2 most recent partitions.
select *
from mydb.table_foo
where
(datehour = CONCAT(
CAST( year(current_timestamp) AS varchar) , '/',
CAST( month(current_timestamp) AS varchar), '/',
CAST( day(current_timestamp) AS varchar), '/',
CAST( hour(current_timestamp) AS varchar))
OR
datehour = CONCAT(
CAST( year(current_timestamp) AS varchar) , '/',
CAST( month(current_timestamp) AS varchar), '/',
CAST( day(current_timestamp) AS varchar), '/',
CAST( ( hour(current_timestamp - interval '1' hour) ) AS varchar)))
AND
record_timestamp BETWEEN (current_timestamp - interval '1' hour) AND current_timestamp
Sample SQL I'd like to make dynamic would be something like this:
select *
from table_foo
where datehour = '2020/11/17/23' or datehour = '2020/11/18/00' AND
record_timestamp BETWEEN (current_timestamp - interval '1' hour) AND current_timestamp
The dynamic WHERE logic poses problems around changes in all fields, month, day, and year. I ran into this problem on Day 1, around midnight today. How can I generate my datehour partition dynamically?