I am writing at Google Dataflow batch pipeline using the Python SDK and I have come across a pipeline failure that does not appear to have any logs in Stackdriver. The failure occurs while running beam.combiners.ToList()
on a dataset ~300MB.
Stackdriver outputs:
The job failed because a work item has failed 4 times. Look in previous log entries for the cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors. The work item was attempted on these workers:
However there are no logs I can find on why this is failing. I am running python3.6 and apache_beam==2.19.0. I am also running --experiments=shuffle_mode=service
, which I am not sure is related.
What are my next steps for debugging?