We are using AWS glue to convert JSON files stored in our S3 datalake.
Here are the steps that I followed,
Created a crawler for generating table on Glue from our datalake bucket which has JSON data.
The newly created tables have partitions as follows,
Name, Year, Month, day, hour
Created a glue job to convert it to Parquet and store in a different bucket
With these process, the jobs run successfully but the data in the new bucket is not partitioned. Its just comes under a single directory.
What I want to achieve is the converted parquet files should get the same partitions as in the source table/data lake bucket.
Also, i want to increase the file size of the parquet files(reduce the number of files).
Can anyone help me on this?
write-dynamic-frame
code to your question and the path(s) within your bucket for the resulting files. Have you tried the example code from Managing Partitions for ETL Output in AWS Glue? – Steven Ensslen