I'm getting some unexpected behavior from Snowflake and I'm hoping someone could explain what's happening and the best way to handle it. Basically, I'm trying to do a nightly refresh of an entire dataset, but truncating the table and staging/copying data into it results in old data being loaded.
I'm using the Python connector with AUTOCOMMIT=False. Transactions are committed manually after every step.
Step 1: Data is loaded into an empty table.
put file://test.csv @test_db.test_schema.%test_table overwrite=true
copy into test_db.test_schema.test_table file_format=(format_name=my_format))
Step 2: Data is truncated
TRUNCATE test_db.test_schema.test_table
Step 3: New data is loaded into the now empty table (same filename, but overwrite set to True).
put file://test.csv @test_db.test_schema.%test_table overwrite=true
copy into test_db.test_schema.test_table file_format=(format_name=my_format))
At this point, if I query the data in the table, I see that it is the data from Step 1 and not Step 3. If in Step 2 I DROP and recreate the table, instead of using TRUNCATE, I see the data from Step 3 as expected. I'm trying to understand what is happening. Is Snowflake using a cached version of the data, even though I'm using PUT with OVERWRITE=TRUE? What's the best way to achieve the behavior that I want? Thank you!