I need to perform an append load to S3 bucket.
- Every day new .gz file gets dumped to a S3 location and glue crawler reads the data and update it in data catalog.
- The Scala AWS Glue job runs and only filters the data for the current day.
- The above filtered data is transformed as per some rules and a partitioned dynamic data frame (i.e. year,month,day) level is created.
Now I need to write this dynamic data frame to S3 bucket which has all the previous day partitions present. In-fact I just need to write only one partition to the S3 bucket.Currently I am using the below piece of code to write data to S3 bucket.
// Write it out in Parquet for ERROR severity
glueContext.getSinkWithFormat(
connectionType = "s3",
options = JsonOptions(Map("path" -> "s3://some s3 bucket location",
"partitionKeys" -> Seq("partitonyear","partitonmonth","partitonday"))),
format = "parquet").writeDynamicFrame(DynamicFrame(dynamicDataframeToWrite.toDF().coalesce(maxExecutors), glueContext))
I am not sure if the above piece of code will perform an append load or not.Is there a way through AWS glue libraries to achieve the same?