I am using Google Dataflow Service to run some apache-beam scripts for ETL.
The jobs used to take 4-5 minutes to complete initially, but now they fail after an hour with following error.
Workflow failed. Causes: (35af2d4d3e5569e4): The Dataflow appears to be stuck.
It appears that the the job didnt actually start.
I was executing it by using python SDK 2.1.0 As answer of this question mentioned to switch the SDK, i tried executing it using python SDK 2.0.0 but no luck.
Job Id is: 2017-09-28_04_28_31-11363700448712622518
Update:
After @BenChambers suggested to check up the logs, it appears that the jobs didn't startup because of the failure of the workers starting
The logs showed following logs 4 times(As mentioned in the dataflow docs, a bundle is tried 4 times before declaring it to be failed)
Running setup.py install for dataflow-worker: finished with status 'done'
Successfully installed dataflow-worker-2.1.0
Executing: /usr/local/bin/pip install /var/opt/google/dataflow/workflow.tar.gz
Processing /var/opt/google/dataflow/workflow.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
IOError: [Errno 2] No such file or directory: '/tmp/pip-YAAeGg-build/setup.py'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-YAAeGg-build/
/usr/local/bin/pip failed with exit status 1
Dataflow base path override: https://dataflow.googleapis.com/
Failed to report setup error to service: could not lease work item to report failure (no work items returned)