4
votes

I'm trying to use AWS Athena to query csv data files on S3, there're multiple date columns in the csv file, but the column types in Athena do not have "date", I tried to use "timestamp", but then the date columns could not be queried.

Any suggestions?

2

2 Answers

6
votes

Actually Athena has a column type date.

Here a short sample of a table with dates.

CSV:

2016-10-12,2016-10-01,hello,world1
2016-10-13,2016-10-01,hello,world2
2016-10-14,2016-10-01,hello,world3
2016-10-15,2016-10-01,hello,world4

DDL:

CREATE EXTERNAL TABLE test (
  startdate date,
  enddate date,
  val1 string,
  val2 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
) LOCATION 's3://test-data/test/';

QUERY:

select * from test where startdate > DATE'2016-10-13';
0
votes

Amazon Athena uses Apache Hive. Apache Hive allows primitive_type of 'date' in Hive 0.12.0 and later. I've tested and confirmed that it works with the current Athena implementation.

Link to documentation: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL