1
votes

I have inherited a S3 bucket from a former colleague, where the files inside are partitioned with id and time, such as:

s3://bucket/partition_id=0/year=2017/month=6/day=1/file

The data in all these files is one table, can be queried through Athena. From the Glue catalogue it also showed that the partition(0) is id, partition(1) is year and so on.

Recently I want to reconstruct the work, and figured the partition using id is not very straightforward. I tried to use the Glue crawler and direct it to the S3 bucket. But there is no where I could choose if I only want it to partition with time, not id, like this:

s3://bucket/year=2017/month=6/day=1/file

I am quite new with AWS and not sure if it is possible or even makes sense to you. Please give me some feedback. Thank you.

2

2 Answers

1
votes

I dont think you can do it with help of crawler, however you can create new table manually in Athena like this (also see https://docs.aws.amazon.com/en_us/athena/latest/ug/ctas-examples.html)

CREATE TABLE new_table
WITH (
     format = 'ORC', 
     external_location = 's3://...', 
     partitioned_by = ARRAY['year', 'month', 'day']) 
AS select * 
FROM old_table;
0
votes

Write python shell job using s3 boto apis to reorganize folder structure and then run crawler