0
votes

We are trying to load data from an S3 bucket into Snowflake using COPY INTO. Works perfectly.. But data in subfolders are also being copied, and this shoud not not happen.

Following hardcoded pattern REGEX works perfectly

copy into TARGETTABLE
from @SOURCESTAGE
pattern='^(?!.*subfolder/).*$'

But we don't want to hardcode the foldername. When I just keep the '/' it doesn't work anymore.. ( same happens when I escape the slash \/ )

copy into TARGETTABLE
from @SOURCESTAGE
pattern='^(?!.*/).*$'

Does anybody knows which REGEX to use to skip any subfolder in the COP INTO in a dynamic way? (no hardcoding of folder name )

@test_stage/folder_include
@test_stage/folder_include/file_that_has_to_be_loaded.csv
@test_stage/folder_include/folder_exclude/file_that_cannot_be_loaded.csv

So only files within folder_include can be picked up by the copy into statement. Everything in a lower level needs to be skipped. Most importantly: without hardcoding on foldername. Any folder within folder_include has to be ignored.

Thanks!

2
I think this may have already been answered here: How to avoid sub folders in snowflake copy statementNickW
Hi Nick, thanks for answering. In fact it is not. This solution provides a hardcodes fix. Not a dynamic one. The solution shows how to skip subfolders by name. I need a more generic solution.Benoit Turbang
Does the pattern you want to keep contain just a single slash character (i.e. subfolders contain 2 or more slashes)? If so this might help (obviously changing % to /): stackoverflow.com/questions/16762492/…NickW
Any folder within the source stage will contain at least 1 '/' . I tried it but it doesnt help. Thanks.Benoit Turbang
If the top-level folder (the one you want to load from) contains 1 '/' then don't the subfolders have to contain more than 1 '/'? If that is the case then why does filtering on a single '/' not work?NickW

2 Answers

0
votes

Here (like mentioned in the comments) is a solution for skipping a hardcoded foldername: How to avoid sub folders in snowflake copy statement

In my opinion replacing the hardcoded-part with .* makes it generic.

Kind regards :)

0
votes

If the PATH that's included in STAGE is static, you can include that in your pattern.

list @SOURCESTAGE PATTERN = 'full_path_to_folder_include/[^/]*'

Even if your path include environment specific folder (for eg. DEV, PROD), you can account for that:

list @SOURCESTAGE PATTERN = 'static_path/[^/]+/path_to_folder/[^/]*'
or
list @SOURCESTAGE PATTERN = 'static_path/(dev|test|prod)/path_to_folder/[^/]*'