1
votes

(Submitting on behalf of a Snowflake User)


We have a database that stores raw data from all our local sources. My team has it's own environment in which we have full permissions to create standardized feeds and/or tables/views etc that is ready to consume through Power BI. A few additional details:

  • The final 'feed' tables are derived through SQL statements and most are pulling from more than one table of our 'raw' data.
  • Raw table data is updated daily.

My question is what is the best operation to keep the tables fully updated and what is the standard work flow for this operations? Our current understanding is one of these processes is the best:

  1. Using COPY INTO <stage> then COPY INTO <table>.
  2. Using STREAMS to add incremental data.
  3. Using PIPES (may be the same as STREAMS)
  4. Or simplify our feeds to one table sources and use a materialized view.

We'd ideally like to avoid views to improve consumption speed at the power bi level.


Tasks have been recommended as it seems like a good fit since they only need to update the final table once per day. (https://docs.snowflake.net/manuals/sql-reference/sql/create-task.html)

Any other recommendations??? THANKS!

1

1 Answers

1
votes

We have a similar scenario where we have our raw datalake tables being updated real time from files in S3. Those raw tables are loaded via snowpipe using the auto ingest feature.

In turn, we have a data mart which contains facts about the raw data. To update the data mart, we have created streams on top of the raw tables to track changes. We then use tasks run at a given frequency (every five minutes in our case) to update the data mart from the changed data in the raw tables. Using streams allows us to limit processing to only changed data, without having to track last update dates, etc.