0
votes

Apache Beam API has the following BiqQuery insert retry policies.

  • How Dataflow job behave if I specify retryTransientErrors?
  • shouldRetry provides an error from BigQuery and I can decide if I should retry. Where can I find expected error from BigQuery?

BiqQuery insert retry policies

https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.html

  • alwaysRetry - Always retry all failures.
  • neverRetry - Never retry any failures.
  • retryTransientErrors - Retry all failures except for known persistent errors.
  • shouldRetry - Return true if this failure should be retried.

Background

  • When my Cloud Dataflow job inserting very old timestamp (more than 1 year before from now) to BigQuery, I got the following error.
 jsonPayload: {
  exception:  "java.lang.RuntimeException: java.io.IOException: Insert failed:
 [{"errors":[{"debugInfo":"","location":"","message":"Value 690000000 for field
 timestamp_scanned of the destination table fr-prd-datalake:rfid_raw.store_epc_transactions_cr_uqjp is outside the allowed bounds.
You can only stream to date range within 365 days in the past and 183 days in
the future relative to the current date.","reason":"invalid"}],
  • After the first error, Dataflow try to retry insert and it always rejected from BigQuery with the same error.
  • It did not stop so I added retryTransientErrors to BigQueryIO.Write step then the retry stopped.
1

1 Answers

1
votes

How Dataflow job behave if I specify retryTransientErrors?

All errors are considered transient except if BigQuery says that the error reason is one of "invalid", "invalidQuery", "notImplemented"

shouldRetry provides an error from BigQuery and I can decide if I should retry. Where can I find expected error from BigQuery?

You can't since the errors are not visible to the caller. I'm not sure if this was done on purpose or whether Apache Beam should expose the errors so users can write their own retry logic.