I am streaming data into a BigQuery table.
- Delete the old table
- Create a new table with same name and same schema
- Stream data into new table
I had done this quite a few times before, it was working fine. But recently I started to see the above approach not working.
After streaming is done (no error reported), I query the table, sometimes it worked. Sometimes, I got empty table. (Same script, same data, run many times, the results are different. Sometimes works, sometime not.)
And to add to the mystery, when I streamed large amount data, it seemed working most of the times. But when I streamed small amount data, then it failed most of the times.
But if I just do
- Create a new table
- Stream data into the new table
It always works.
I tried this both in Google Apps Scrip and PHP Google Cloud Client Library for BigQuery. I had the same problems.
So I tried this in Google Apps Script
- Delete the old table
- Sleep 10 seconds, so the delete job should be done
- Create a new table with same name and same schema
- Sleep 10 seconds, so the create job should be done
- Stream data into new table
It still gave me the same problems.
But there are no error reported or logged.
Additional Information:
I tried again.
If I wait until the stream buffer is empty, and then run the script. The results are always correct. The new data streamed into the new table successfully.
But if I run the script, right after previous running, then the results are empty. The data is not streamed into the new table.
So error seems happening when I "delete the old table and create the new table" when stream buffer is not empty.
But according to the answer from this thread, BigQuery Stream and Delete while streaming buffer is not empty?,
the old table and new table (even they are with the same name and same schema), they are with two different "object id". They are actually two different tables. After I delete the old table, the old records in stream buffer would be dropped too. Stream buffer is empty or not, it should not affect my next steps, create a new table and stream new data to the new table.
On the other hand, if I try to "truncate old table", instead of "delete old table and create a new table", while there might still be data in stream buffer, then "DML statement cannot modify data still in stream buffer", so "truncate old table" would fail.
In simple words, in this use case,
- I cannot truncate the old table, because the steam buffer may not be empty.
- I am supposed to "delete old table and create new table, then stream data to new table". But it seems it is the root of my current problems, my new data cannot be streamed to new table (even the new table is with a new object id, and it should not be affected by the fact I just delete an old table)