2
votes

I'm loading data to snowflake using an approach I found on the forums snowpipe -> load_table <-> staging table -> final table I have a task tree that check the stream on the load_table and if it finds data, swaps the load_table with the staging_table further tasks process the staging_table into the final table The staging_table is then truncated and swapped back with the load_table

This typically works fine, but the problem I am seeing is that I end up with orphan records on either the load_table or the staging_table and the load_stream is empty. Its at the point right now where even if I manually insert data into the load_table the stream is still marked as empty so no tasks run.

What is the expected behaviour when swapping tables that contain streams, is the above behaviour supported or do I need to look at an alternative? The goal is to use snowpipe to load files from S3 into a temp table and merge them into a final table, without having an ever growing staging table to manage...

Thanks!

/edit doing some some experimenting and it seems that when the tables are swapped the stream still listens to the "original" table so will ignore any data that snowpipe loads to the "new" table, even though that new table has swapped with the original...

1
why are you using a staging table at all? why not run the merge statement directly off the stream? I don't think this is the right use-case for a table swap.Simon D
In that scenario how do I remove processed records from the load table?user2771842
If you're using an append_only stream then you can truncate the table after you run your merge statement. Any inserted records that were not picked up by your merge statement but were inserted while it was running will be deleted out of the load table but they will still remain in the stream and will be processed on the next run of the merge..Simon D
@SimonDarr is correct here. You are over-complicating your process and not really leveraging the full benefits of STREAMS. Just have a single table with a stream over it and MERGE directly from the stream.Mike Walton

1 Answers

0
votes

The problem is, the SHOW STREAMS and DESCRIBE STREAM provides wrong information:

create or replace table test1 (v varchar);
create or replace table test2 (v1 varchar, v2 varchar);
create or replace stream test_stream_1 on table test1;
alter table test1 swap with test2;
show streams like 'test_stream_1';    

+---------------+------------------------+
|     name      |       table_name       |
+---------------+------------------------+
| TEST_STREAM_1 | GOKHAN_DB.PUBLIC.TEST1 |
+---------------+------------------------+

It should point to GOKHAN_DB.PUBLIC.TEST2 after the swap operation! I suggest you to submit a support case.

The good thing is, get_ddl returns the correct result:

select get_ddl('stream','test_stream_1');

+--------------------------------------------------------+
|           GET_DDL('STREAM','TEST_STREAM_1')            |
+--------------------------------------------------------+
| create or replace stream TEST_STREAM_1 on table TEST2; |
+--------------------------------------------------------+