0
votes

I am a newbie to AWS and snowflake. I am looking to load csv files from S3 to respective snowflake tables (about 100 tables) using aws glue. I was able to load data into 1 snowflake table using the below article

https://support.snowflake.net/s/article/How-to-Set-up-AWS-Glue-ETL-for-Snowflake

Is it possible to use 1 aws glue to load a list of tables?

Inside AWS Glue - can we write logic to update or insert data in snowflake based on csv files ?

Please advice and share any sample code /solutions if any.

Thanks, Jo

3
I know you are asking for Glue specifically but like someone else pointed out, you can use other tools that aren't so intensive. I would look into Snowflake's Snowpipe service. Basically you will need to set up a notification in S3 then some additional setup in Snowflake then Snowflake will auto-ingest new records from S3 without any jobs you need to maintain: docs.snowflake.com/en/user-guide/…Brock

3 Answers

0
votes

First of all, if you do not need Spark to process/transform data in your CSV files, using Snowflake COPY command would be a better option. At the end, AWS Glue (Spark) will also upload the files on an internal stage and use COPY command to insert data to Snowflake database.

For using COPY command to load data, please check:

https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html

https://docs.snowflake.com/en/user-guide/data-load-external-tutorial.html

About your questions:

Is it possible to use 1 aws glue to load a list of tables?

Yes, it's possible to use 1 AWS Glue job to load multiple tables. AWS Glue is a flexible tool that you can write your custom Spark code. On the other hand, for simplicity, I recommend you to use 1 job for 1 table.

Inside AWS Glue - can we write logic to update or insert data in snowflake based on csv files ?

Yes you can, but Spark is designed to process bulk data and Snowflake is a data warehouse. Updating or inserting single rows will be inefficient for both Spark and Snowflake. For running DMLs check:

https://docs.snowflake.com/en/user-guide/spark-connector-use.html#executing-ddl-dml-sql-statements

0
votes

There is a simple process to load data in to tables in snowflake. Please refer to below video .

https://www.youtube.com/watch?v=KslOVvXy1R4&feature=youtu.be

SELECT t.$1 as MONTH_NUM,T.$2 AS MONTH_NAME from @mys3stage (file_format=>'myfileformat')
t; (edited) 
0
votes

As a matter of first importance, on the off chance that you needn't bother with Spark to process/change data in your CSV records, utilizing Snowflake COPY order would be a superior alternative. Toward the end, AWS Glue (Spark) will likewise transfer the records on an inner stage and use COPY order to embed data to Snowflake database.

About your inquiries: Is it possible to use 1 aws glue to load a list of tables?

Indeed, it's conceivable to utilize 1 AWS Glue employment to stack different tables. AWS Glue is an adaptable instrument that you can compose your custom Spark code. Then again, for effortlessness, I prescribe you to utilize 1 occupation for 1 table.

Inside AWS Glue - can we write logic to update or insert data in snowflake based on csv files ?

Truly you can, however Spark is intended to process mass data and Snowflake is a data warehouse. Refreshing or embeddings single lines will be wasteful for both Spark and Snowflake.