0
votes

Is it possible to call Snowflake Procedure (It has a Merge statement which copies data from Stage to snowflake main table) through AWS Lambda function? I want the function to be triggered as soon as we push a file to the S3 stage. We have a Snowpile option to copy the data but I don't want another table like one to copy the data from the S3 bucket to the snowflake stage and then merge it to the master table rather I have a merge statement which directly merges the data from the file in S3 bucket to the snowflake master table.

1
I would recommend to use SnowPipe for this. AWS Lambda has a limitation on how long it can run, I think it is 15 minutes. If your processing takes more time, it will time out.Rajib Deb
I believe snow pipe doesn't have flexibility to use MERGE operation as it only supports COPY operation from stage to the main table but to reduce extra computation and additional space, I am trying to use merge operation directly from S3 Stage to Snowflake master table. If time duration is an issue, I can think of splitting the file in small size.Ashish Kumar

1 Answers

1
votes

first I would create an external stage in snowflake:

CREATE STAGE YOUR_STAGE_NAME
  URL='s3://your-bucket-name'
  CREDENTIALS = (
    AWS_KEY_ID='xxx'
    AWS_SECRET_KEY='yyy'
  )
  FILE_FORMAT = (
    TYPE=CSV
    COMPRESSION=AUTO,
    FIELD_DELIMITER=',',
    SKIP_HEADER=0,
    ....
  );

then I would call a query on your csv file in the stage using your python lambda script (pulling the file name from the python event payload):

WITH file_data AS (
   select t.$1, t.$2,<etc..> from @YOUR_STAGE_NAME/path/your-file.csv t
)
MERGE INTO MASTER_TABLE USING FILE_DATA ON <......>

you should be able to wrap this into a stored procedure if you so desire. there is also syntax to skip creating the named external stage and reference the bucket name and credentials within the query call, but I wouldn't recommend this since it would expose your aws credentials in plaintext in your stored procedure definition