Background
I have implemented a snowflake data pipeline (s3 log file > SNS > pipe > stage table > stream > task > stored proc/UDF > final table) in our production snowflake database.
While things were working on a smaller scale in our dev database, it seems the production pipeline has stopped working given the amount of data (6416006096 records and growing) attempting to flow throw it.
Problem
After some investigation so far it looks like s3 log > SNS > pipe > stage table is OK, but I things are stuck where the task retrieves records from the stream... The stream is NOT stale. I have spent a lot of time reading the docs on streams and have not found any help in there for my current issue.
It looks like the stream has too much data to return -- when I try to get a count(*) or * with limit 10 from the stream it is not returning after 8 minutes (and counting)...
Even if I could limit the data returned, I have experimented where once you select from the stream within a transaction, you can lose all changes even if you don't want them all (i.e., use a where clause to filter)...
Question
Is there any way to get anything to return from the stream without resetting it?
Is there anyway to chunk the results from the stream without losing all changes within a transaction?
Is there some undocumented limit with streams--have I hit that?
Concern
I don't want to shut down the data pipeline bc that means I may have to start all over but I guess I will have to if I get no answers (I have contacted support too but have yet to hear back). Given that streams and tasks are still only preview I guess this shouldn't be a surprise, but I was told they would be GA by now from Snowflake.