I am trying to copy the data from source AWS S3 bucket to destination AWS S3 bucket.
Current Partition of source bucket:
AccountId/Type/Role/Job/Name/RequestId/JobName/file1.csv
and I have many prefixes with this partitions.
But in destination bucket I want to change the partition to be like:
AccountId/Type/Role/Job/Name/file2.csv
and add RequestId
and JobName
as new columns in csv file and respective value which is merge all csv file in one file.
I am trying AWS S3 replication to replicate the data from source to destination bucket but I did not find any feature in AWS S3 replication which let me modify the partition and merge all files at the time of replication.
I want to use destination bucket to query data using AWS Athena. For that I use AWS Glue crawler to crawl the data to create the database and table which will be used by AWS Athena to query the data.
(Update):
These partitions are not stored as key=value
and are too fine grained which leads to smaller files. Having many small files to process will hurt Athena performance.
Question: Is there a way to achieve this transformation using S3 replication? If not, Is is possible to achieve this by AWS glue jobs to transform data like mentioned above? Or is there any other way to achieve this transformation between S3 buckets? And how?