OK I want to load data from amazon s3 into a dynamic frame but limit it by a date range.
My data is stored in parquet files in s3 in this format:
s3://bucket/all-dates/year=2021/month=4/day=13/
s3://bucket/all-dates/year=2021/month=4/day=14/
s3://bucket/all-dates/year=2021/month=4/day=15/
s3://bucket/all-dates/year=2021/month=4/day=16/
Currently I load the data into my script as:
ds1 = glueContext.create_dynamic_frame_from_options(
connection_type = "s3",
connection_options =
{"paths":
[
"s3://bucket/all-dates/"
],
"recurse": True
},
format = "parquet"
)
Which is fine as currently it loads all data into the dataframe. But what I would like to do is somehow only recurse through the latest week, or latest 2 weeks of files based from the date the script runs.
Any help appreciated. Thanks