Apache Beam API has the following BiqQuery insert retry policies.
- How Dataflow job behave if I specify retryTransientErrors?
- shouldRetry provides an error from BigQuery and I can decide if I should retry. Where can I find expected error from BigQuery?
BiqQuery insert retry policies
- alwaysRetry - Always retry all failures.
- neverRetry - Never retry any failures.
- retryTransientErrors - Retry all failures except for known persistent errors.
- shouldRetry - Return true if this failure should be retried.
Background
- When my Cloud Dataflow job inserting very old timestamp (more than 1 year before from now) to BigQuery, I got the following error.
jsonPayload: {
exception: "java.lang.RuntimeException: java.io.IOException: Insert failed:
[{"errors":[{"debugInfo":"","location":"","message":"Value 690000000 for field
timestamp_scanned of the destination table fr-prd-datalake:rfid_raw.store_epc_transactions_cr_uqjp is outside the allowed bounds.
You can only stream to date range within 365 days in the past and 183 days in
the future relative to the current date.","reason":"invalid"}],
- After the first error, Dataflow try to retry insert and it always rejected from BigQuery with the same error.
- It did not stop so I added retryTransientErrors to BigQueryIO.Write step then the retry stopped.