0
votes

​Hi,

While streaming data to BigQuery, we are facing some inconsistency in data ingested when making https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll requests using BigQuery Java library.

Some of the batches fail with error code: backendError, while some requests time-out with exception stacktrace: https://gist.github.com/anonymous/18aea1c72f8d22d2ea1792bb2ffd6139

For batches which have failed, we have observed 3 different kinds of behaviours related to ingested data:

  1. All records in that batch fail to be ingested into BigQuery
  2. Only some of the records fail to be ingested into BigQuery
  3. All records successfully gets ingested into BigQuery​ in-spite of the​ thrown error

Our questions are:

  1. How can we distinguish between these 3 cases.
  2. For case 2, how can we handle partially ingested data, i.e., which records from that batch should be retried?
  3. For case 3, if all records were successfully ingested, why is error thrown at all?

Thanks in advance...

1

1 Answers

0
votes

For partial success, the error response will indicate which rows got inserted and which ones failed - especially, for parsing errors. There are cases where the response fails to reach your client resulting in timeout errors even though the insert succeeded. In general, you can retry the entire batch and it will be deduplicated if you use the approach outlined in the data consistency documentation.