0
votes

I am loading csv files from Amazon S3 to Snowflake by first loading into a Snowflake External Stage pointing to Amazon S3 and following with a COPY command. From what I understand, the purge feature clears or leaves the stage intact once the movement is finished. I am using the same stage for subsequent calls of the same nature and having purge disabled would create duplicates and continue to stack in the same stage. The remove call seems to clear the stage, but also my S3 files.

Is there a way that I can purge the stage while leaving the s3 files intact?

1
You can create a stage from your aws s3 bucket then directly load into snowflake using that stage. Have you tried that approach ?hopeIsTheonlyWeapon
Then you dont have to purge at all. Please let me know.hopeIsTheonlyWeapon
also if you are using an external stage then it depends on you how you are writing the files into your s3 bucket. Please let me know If I am understanding your question correctly ?hopeIsTheonlyWeapon
Sorry if my question may have been unclear, I am creating a stage with a url being my s3 bucket (specifically a file), then following with a copy into from that stage. I am using this stage for the same workflow in subsequent calls and would like the previous staged data to be clear.Alvin
why don't you name the file in such way that it will be unique inside s3. Also snowflake COPY command is intelligent not to load the same file again. And then you can try s3 lifecycle to remove the s3 object at your will. Does that answer your question ?hopeIsTheonlyWeapon

1 Answers

0
votes

The answer to your initial question "Is there a way that I can purge the stage while leaving the s3 files intact?" is no. An external stage is a reference to a file location (and the files in that location) so purging a stage (i.e. deleting the files in the referenced location; this is what 'purging' means) but keeping the files in that location is not logically possible.

As mentioned in the comments, if you want to keep a copy of the files in S3 then when you copy them to the Stage location just copy them to another S3 location at the same time.

I don't entirely understand when you say "I am using the same stage for subsequent calls of the same nature". I assume you are not trying to load the same files again so if this is a different set of files why don't you just use a different stage referencing a different S3 location?

As also mentioned in the comments, even if you keep loading data from the same stage (without purging) you won't create duplicates as Snowflake recognises files it has also processed and wont reload them.