1
votes

I've got an ETL process which streams data from a mongo cluster to BigQuery. This runs via cron on a weekly basis, and manually when needed. I have a separate dataset for each of our customers, with the table structures being identical across them.

I just ran the process, only to find that while all of my data chunks returned a "success" response ({"kind": "bigquery#tableDataInsertAllResponse"}) from the insertAll api, the table is empty for one specific dataset.

I had seen this happen a few times before, but was never able to re-create. I've now run it twice more with the same results. I know my code is working, because the other datasets are properly populated.

There's no 'streaming buffer' in the table details, and running a count(*) query returns 0 response. I've even tried removing cached results from the query, to force freshness - but nothing helps.

Edit - After 10 minutes from my data stream (I keep timestamped logs) - partial data now appears in the table; however, after another 40 minutes, it doesn't look like any new data is flowing in.

Is anyone else experiencing hiccups in streaming service?

Might be worth mentioning that part of my process is to copy the existing table to a backup table, remove the original table, and recreate it with the latest schema. Could this be affecting the inserts on some specific edge cases?

1

1 Answers

3
votes

Probably this is what is happening to you: BigQuery table truncation before streaming not working

If you delete or create a table, you must wait a least 2 minutes to start streaming data on it.

Since you mentioned that all other tables are working correctly and only the table that has the deletion process is not saving data then probably this explains what you are observing.

To fix this issue you can either wait a bit longer before streaming data after the delete and create operations or maybe changing the strategy to upload the data (maybe saving it into some CSV file and then using job insert methods to upload the data into the table).