2
votes

In s3 bucket daily new JSON files are dumping , i have to create solution which pick the latest file when it arrives PARSE the JSON and load it to Snowflake Datawarehouse. may someone please share your thoughts how can we achieve

4

4 Answers

3
votes

There are a number of ways to do this depending on your needs. I would suggest creating an event to trigger a lambda function.

https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

Another option may be to create a SQS message when the file lands on s3 and have an ec2 instance poll the queue and process as necessary.

https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html https://boto3.amazonaws.com/v1/documentation/api/latest/guide/sqs-example-long-polling.html

edit: Here is a more detailed explanation on how to create events from s3 and trigger lambda functions. Documentation is provided by Snowflake

https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe-rest-lambda.html

0
votes

Look into Snowpipe, it lets you do that within the system, making it (possibly) much easier.

0
votes

There are some aspects to be considered such as is it a batch or streaming data , do you want retry loading the file in case there is wrong data or format or do you want to make it a generic process to be able to handle different file formats/ file types(csv/json) and stages. In our case we have built a generic s3 to Snowflake load using Python and Luigi and also implemented the same using SSIS but for csv/txt file only.

0
votes

In my case, I have a python script which get information about the bucket with boto.

Once I detect a change, I call the REST Endpoint Insertfiles on SnowPipe.

Phasing:

  • detect S3 change
  • get S3 object path
  • parse Content and transform to CSV in S3 (same bucket or other snowpipe can connect)
  • Call SnowPipe REST API

What you need:

  • Create a user with a public key
  • Create your stage on SnowFlake with AWS credential in order to access S3
  • Create your pipe on Snowflake with your user role
  • Sign a JWT

I also tried with a Talend job with TOS BigData.

Hope it helps.