orig_dyf = glueContext.create_dynamic_frame.from_options(
's3',
{
"paths": [
's3://bucket/sample_data/'
],
"recurse" : True,
"exclusions" : "[\"temp/**\"]"
},
"json",
transformation_ctx = "orig_dyf")
I want to exclude the files from the folder temp, but this isn't working. As per https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#aws-glue-programming-etl-connect-s3 we should be passing a string containing a JSON list of Unix-style glob patterns. Weird, that when I use
"[\"**.csv\"]"
or a file suffix, it actually works. When I try to exclude a folder, it doesn't work and still includes the files.
According to https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude
myfolder/**
the expected behaviour is matches objects in all subfolders of myfolder, such as /myfolder/mysource/mydata and /myfolder/mysource/data