3
votes

I have a google cloud dataflow job (using apache beam python sdk) scheduled after every 6 minutes which internally read from a Big Query Table , does some transformations and write to another Big Query table. This Jobs has started failing intermittently (~ 4 out of 10 times) with the the following error trace.

2021-02-17 14:51:18.146 ISTError message from worker: Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
    work_executor.execute()
  File "/usr/local/lib/python3.8/site-packages/dataflow_worker/executor.py", line 225, in execute
    self.response = self._perform_source_split_considering_api_limits(
  File "/usr/local/lib/python3.8/site-packages/dataflow_worker/executor.py", line 233, in _perform_source_split_considering_api_limits
    split_response = self._perform_source_split(source_operation_split_task,
  File "/usr/local/lib/python3.8/site-packages/dataflow_worker/executor.py", line 271, in _perform_source_split
    for split in source.split(desired_bundle_size):
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/bigquery.py", line 807, in split
    self.table_reference = self._execute_query(bq)
  File "/usr/local/lib/python3.8/site-packages/apache_beam/options/value_provider.py", line 135, in _f
    return fnc(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/bigquery.py", line 851, in _execute_query
    job = bq._start_query_job(
  File "/usr/local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 236, in wrapper
    return fun(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 459, in _start_query_job
    response = self.client.jobs.Insert(request) 
  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py", line 344, in Insert
    return self._RunMethod(
  File "/usr/local/lib/python3.8/site-packages/apitools/base/py/base_api.py", line 731, in _RunMethod
    return self.ProcessHttpResponse(method_config, http_response, request)
  File "/usr/local/lib/python3.8/site-packages/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse 
    self.__ProcessHttpResponse(method_config, http_response, request))
  File "/usr/local/lib/python3.8/site-packages/apitools/base/py/base_api.py", line 603, in __ProcessHttpResponse
    raise exceptions.HttpError.FromResponse( apitools.base.py.exceptions.HttpConflictError: HttpError accessing <https://bigquery.googleapis.com/bigquery/v2/projects/bbb-erizo/jobs?alt=json>:
      response: <{
        'vary': 'Origin, X-Origin, Referer',
        'content-type': 'application/json; charset=UTF-8',
        'date': 'Wed, 17 Feb 2021 09:21:17 GMT',
        'server': 'ESF',
        'cache-control': 'private',
        'x-xss-protection': '0',
        'x-frame-options': 'SAMEORIGIN',
        'x-content-type-options': 'nosniff',
        'transfer-encoding': 'chunked',
        'status': '409',
        'content-length': '402',
        '-content-encoding': 'gzip'
      }>,
      content <{
        "error": {
          "code": 409,
          "message": "Already Exists: Job bbb-erizo:asia-northeast1.beam_bq_job_QUERY_AUTOMATIC_JOB_NAME_a2207822-8_754",
          "errors": [ {
            "message": "Already Exists: Job bbb-erizo:asia-northeast1.beam_bq_job_QUERY_AUTOMATIC_JOB_NAME_a2207822-8_754",
            "domain": "global", "reason": "duplicate"
           } ],
         "status": "ALREADY_EXISTS"
       }
     } >

From the error trace it seems like some how the job id generated for BQ by dataflow job gets dupplicated or something but since i am not assigning BQ job ID explicitly as its getting done by Dataflow itself so i do not have any control on that part.

Please suggest !!

1
Which SDK version are you using? This was fixed in 2.26 with 13001 - Peter Kim
A workaround for this is that you can recreate your Dataflow template frequently to avoid name collisions - Pablo

1 Answers

5
votes

This is a bug. It should be fixed with https://github.com/apache/beam/pull/13749 which will be part of Beam 2.28.0.