1
votes

I am evaluating snowflake for reporting usecase. I am considering snowpipe for ETL. Data is ingested from S3. Data in S3 contains information about user sessions captured at regular interval. In Snowflake, I want to stored these data aggregate. As per documentation snowflake supports only basic transformation and doesn't support group by and join during copying data from S3 staging to tables in snowflake.

I am new to ETL and snowflake. One way i was thinking is to load raw detailed data from staging to temporary table in snowflake. Then run aggregations (group by and join) on temporary table to load data into final fact tables. Is this the correct approach for implementing complex tranformations?

2
You are thinking in the right way, but also see if CTE(Common Table Expression) will be better than Temporary tables in some casesRajib Deb
Unless you need to continuously stream the data into Snowflake (which I'm guessing you couldn't aggregate on-the-fly anyway) the just use COPY INTO... instead of SnowpipeNickW

2 Answers

2
votes

Temporary tables in Snowflake only stick around for the session that they have been created in. This means that you won't be able to point a Snowpipe to it.

Instead of a temporary table, point Snowflake to a transient table to store the raw data and then truncate the table after some period of time. This will reduce costs. Personally, I'd keep the data in the transient table for as long as possible provided that it is not too cost prohibitive. This is to account for potentially late data etc.

0
votes

Yes, your aproach looks good to me.

Snowpipe loads your data continously from S3 to Snowflake and within Snowflake you use

  1. Views
  2. Tables and Stored Procedures

to transform the data and load it into your final fact table.