Is it possible to call Snowflake Procedure (It has a Merge statement which copies data from Stage to snowflake main table) through AWS Lambda function? I want the function to be triggered as soon as we push a file to the S3 stage. We have a Snowpile option to copy the data but I don't want another table like one to copy the data from the S3 bucket to the snowflake stage and then merge it to the master table rather I have a merge statement which directly merges the data from the file in S3 bucket to the snowflake master table.
0
votes
I would recommend to use SnowPipe for this. AWS Lambda has a limitation on how long it can run, I think it is 15 minutes. If your processing takes more time, it will time out.
– Rajib Deb
I believe snow pipe doesn't have flexibility to use MERGE operation as it only supports COPY operation from stage to the main table but to reduce extra computation and additional space, I am trying to use merge operation directly from S3 Stage to Snowflake master table. If time duration is an issue, I can think of splitting the file in small size.
– Ashish Kumar
1 Answers
1
votes
first I would create an external stage in snowflake:
CREATE STAGE YOUR_STAGE_NAME
URL='s3://your-bucket-name'
CREDENTIALS = (
AWS_KEY_ID='xxx'
AWS_SECRET_KEY='yyy'
)
FILE_FORMAT = (
TYPE=CSV
COMPRESSION=AUTO,
FIELD_DELIMITER=',',
SKIP_HEADER=0,
....
);
then I would call a query on your csv file in the stage using your python lambda script (pulling the file name from the python event payload):
WITH file_data AS (
select t.$1, t.$2,<etc..> from @YOUR_STAGE_NAME/path/your-file.csv t
)
MERGE INTO MASTER_TABLE USING FILE_DATA ON <......>
you should be able to wrap this into a stored procedure if you so desire. there is also syntax to skip creating the named external stage and reference the bucket name and credentials within the query call, but I wouldn't recommend this since it would expose your aws credentials in plaintext in your stored procedure definition