1
votes

I want to create a VIEW that always returns the last 1 hour of data from the 2 most recent Athena partitions.

I am using the following Amazon Athena DDL with a partition column called 'datehour' of type varchar.

CREATE EXTERNAL TABLE mydb.table_foo (
`account_id` string,
`account_email_address` string,
`record_timestamp` timestamp)
PARTITIONED BY ( 
  `datehour` string)
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://data-here-9191919191/table_foo'
TBLPROPERTIES (
  'projection.datehour.format'='yyyy/MM/dd/HH', 
  'projection.datehour.interval'='1', 
  'projection.datehour.interval.unit'='HOURS', 
  'projection.datehour.range'='2020/11/01/00,NOW', 
  'projection.datehour.type'='date', 
  'projection.enabled'='true', 
  'storage.location.template'='s3://data-here-9191919191/table_foo/${datehour}', 
  'transient_lastDdlTime'='1604013447')

Here's the query I want to run to create a VIEW that always returns the last 1 hour of data from a the 2 most recent partitions.

select *
from mydb.table_foo
where
(datehour = CONCAT( 
  CAST( year(current_timestamp) AS varchar) , '/',
  CAST( month(current_timestamp) AS varchar), '/',
  CAST( day(current_timestamp) AS varchar), '/',
  CAST( hour(current_timestamp) AS varchar)) 
 OR
  datehour = CONCAT( 
  CAST( year(current_timestamp) AS varchar) , '/',
  CAST( month(current_timestamp) AS varchar), '/',
  CAST( day(current_timestamp) AS varchar), '/',
  CAST( ( hour(current_timestamp - interval '1' hour) ) AS varchar)))
AND
 record_timestamp BETWEEN  (current_timestamp - interval '1' hour)  AND current_timestamp

Sample SQL I'd like to make dynamic would be something like this:

select *
from table_foo
where datehour = '2020/11/17/23' or datehour = '2020/11/18/00' AND
 record_timestamp BETWEEN  (current_timestamp - interval '1' hour) AND current_timestamp

The dynamic WHERE logic poses problems around changes in all fields, month, day, and year. I ran into this problem on Day 1, around midnight today. How can I generate my datehour partition dynamically?

1

1 Answers

1
votes

The solution should be that you cast current_timestamp - interval '1' hour as varchar in the correct format. Assuming the data is partitioned as yyyy/MM/dd/HH as described in your table definition:

select *
from mydb.table_foo
WHERE datehour BETWEEN date_format(date_trunc('hour',current_timestamp - interval '1' hour),'%Y/%m/%d/%H') AND date_format(date_trunc('hour',current_timestamp),'%Y/%m/%d/%H')
AND record_timestamp BETWEEN  (current_timestamp - interval '1' hour)  AND current_timestamp