I have partitioned data stored in S3 in hive format like this.
bucket/year=2017/month=3/date=1/filename.json
bucket/year=2017/month=3/date=2/filename1.json
bucket/year=2017/month=3/date=3/filename2.json
Every partition has around 1,000,000 records. I have created table and partitions in Athena for this.
Now running query from Athena
select count(*) from mts_data_1 where year='2017' and month='3' and date='1'
this query is taking 1800 seconds to scan 1,000,000 records.
So my question is how can I improve this query performance?