0
votes

I am writing at Google Dataflow batch pipeline using the Python SDK and I have come across a pipeline failure that does not appear to have any logs in Stackdriver. The failure occurs while running beam.combiners.ToList() on a dataset ~300MB.

Stackdriver outputs:

The job failed because a work item has failed 4 times. Look in previous log entries for the cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors. The work item was attempted on these workers:

However there are no logs I can find on why this is failing. I am running python3.6 and apache_beam==2.19.0. I am also running --experiments=shuffle_mode=service, which I am not sure is related.

What are my next steps for debugging?

1

1 Answers

0
votes

I can see that you have already tried to search for previous logs entries, but I was wondering if you have filtered by Error-level and Fatal-level as recommended in The job failed because a work item failed 4 times? In this link you will find that there could be 4 similar errors which can be the cause of the failure.

In addition, you can add debug messages to verify your steps are flowing well, please refer to Adding log messages to your pipeline for more information about it.