3
votes

I am seeing that stream dataset became empty even if I consume only one record from stream dataset. I dont think it is correct.

These are the steps I did. -- Create a table to store the names and fees paid by members of a gym create or replace table members ( id number(8) not null, name varchar(255) default null, fee number(3) null );

-- Create a stream to track changes to date in the MEMBERS table create or replace stream member_check on table members;

-- Create a table to store the dates when gym members joined create or replace table signup ( id number(8), dt date );

insert into members (id,name,fee) values (1,'Joe',0), (2,'Jane',0), (3,'George',0), (4,'Betty',0), (5,'Sally',0);

insert into signup values (1,'2018-01-01'), (2,'2018-02-15'), (3,'2018-05-01'), (4,'2018-07-16'), (5,'2018-08-21');

-- The stream records the inserted rows select * from member_check;

+----+--------+-----+-----------------+-------------------+------------------------------------------+
| ID | NAME   | FEE | METADATA$ACTION | METADATA$ISUPDATE | METADATA$ROW_ID                          |
|----+--------+-----+-----------------+-------------------+------------------------------------------|
|  1 | Joe    |   0 | INSERT          | False             | d200504bf3049a7d515214408d9a804fd03b46cd |
|  2 | Jane   |   0 | INSERT          | False             | d0a551cecbee0f9ad2b8a9e81bcc33b15a525a1e |
|  3 | George |   0 | INSERT          | False             | b98ad609fffdd6f00369485a896c52ca93b92b1f |
|  4 | Betty  |   0 | INSERT          | False             | e554e6e68293a51d8e69d68e9b6be991453cc901 |
|  5 | Sally  |   0 | INSERT          | False             | c94366cf8a4270cf299b049af68a04401c13976d |
+----+--------+-----+-----------------+-------------------+------------------------------------------+

-- Apply a $90 fee to members who joined the gym after a free trial period ended: merge into members m using ( select id, dt from signup s where datediff(day, '2018-08-15'::date, s.dt::date) < -30) s on m.id = s.id when matched then update set m.fee = 90;

select * from members;

+----+--------+-----+
| ID | NAME   | FEE |
|----+--------+-----|
|  1 | Joe    |  90 |
|  2 | Jane   |  90 |
|  3 | George |  90 |
|  4 | Betty  |   0 |
|  5 | Sally  |   0 |
+----+--------+-----+

-- The stream records the updated FEE column as a set of inserts -- rather than deletes and inserts because the stream contents -- have not been consumed yet select * from member_check;

+----+--------+-----+-----------------+-------------------+------------------------------------------+
| ID | NAME   | FEE | METADATA$ACTION | METADATA$ISUPDATE | METADATA$ROW_ID                          |
|----+--------+-----+-----------------+-------------------+------------------------------------------|
|  1 | Joe    |  90 | INSERT          | False             | 957e84b34ef0f3d957470e02bddccb027810892c |
|  2 | Jane   |  90 | INSERT          | False             | b00168a4edb9fb399dd5cc015e5f78cbea158956 |
|  3 | George |  90 | INSERT          | False             | 75206259362a7c89126b7cb039371a39d821f76a |
|  4 | Betty  |   0 | INSERT          | False             | 9b225bc2612d5e57b775feea01dd04a32ce2ad18 |
|  5 | Sally  |   0 | INSERT          | False             | 5a68f6296c975980fbbc569ce01033c192168eca |
+----+--------+-----+-----------------+-------------------+------------------------------------------+

-- Create a table to store member details in production create or replace table members_prod ( id number(8) not null, name varchar(255) default null, fee number(3) null );

-- Insert the first batch of stream data into the production table insert into members_prod(id,name,fee) select id, name, fee from member_check where metadata$action = 'INSERT' and id=1;

in the above insert statement, I have consumed only one record i.e id=1 from stream dataset(member_check) but after that my stream dataset became empty and ready for new offset.

That means snowflake does not check what are the records that are consumed from stream dataset, As soon as any DML statement triggered on stream dataset, Snowflake assumes that all the records have been consumed and empties the stream dataset. (Sic!!!)

1

1 Answers

0
votes

Streams are still a fairly new feature and this surprised me as well, but any DML statement advances the offset for the stream. My discovery involved a failed transaction in a stored procedure that emptied the stream.

The best option now is to either consume all changes in the stream at once within a single transaction OR to use the CHANGES feature from time travel instead.