11
votes

I have a firehose that stores data in s3 in the default directory structure: "YY/MM/DD/HH" and a table in athena with these columns defined as partitions:

year: string, month: string, day: string, hour: string

after running

msck repair table clicks

I only receive:

Partitions not in metastore:    clicks:2017/08/26/10

I can add these partitions manually and everything works however, I was wondering why msck repair does not add these partitions automatically and update the metastore?

2
@DuduMarkovitz my issue is after running msck repair new partitions are not added automatically, like the posts above show - Sam
... and like my answers show, only a specific directory naming convention, which you are not using, is supported - David דודו Markovitz
thanks Ill add year, month , day , hour specifically in my directories - Sam

2 Answers

7
votes

To use Athena MSCK REPAIR with S3 you need to use key-value pairs as path prefix:

clicks/year=2017/month=08/day=26/hour=10/

instead of: clicks/2017/08/26/10/

Alternatively, update the partitions directly in Glue (manually or use a crawler).

Found this here: https://forums.aws.amazon.com/message.jspa?messageID=789078

1
votes

For future reference, aside from the two tips mentioned in this article: https://aws.amazon.com/premiumsupport/knowledge-center/athena-aws-glue-msck-repair-table/

  • Allow glue:BatchCreatePartition in the IAM policy
  • Change the S3 path to flat case

You also need to set the TableType attribute to a non-null value. In my case, it was EXTERNAL_TABLE.