Cloud Dataflow Python: Failed to install packages: failed to install workflow

Question

I am trying to test my dataflow pipeline on the DataflowRunner. My code always gets stuck at 1 hr 1min and says: The Dataflow appears to be stuck. When digging through the stack trace of the Dataflow stackdriver, I come across the error saying the Failed to install packages: failed to install workflow: exit status 1. I saw other stack overflow messages saying that this can be caused when pip packages are not compatible. This is causing my worker startup to always fail.

This is my current setup.py. Can someone please help me understand what I am missing. The job id is 2018-02-09_08_22_34-6196858167817670597.

setup.py

from setuptools import setup, find_packages


requires = [
            'numpy==1.14.0',
            'google-cloud-storage==1.7.0',
            'pandas==0.22.0',
            'sqlalchemy-vertica[pyodbc,turbodbc,vertica-python]==0.2.5',
            'sqlalchemy==1.2.2',
            'apache_beam[gcp]==2.2.0',
            'google-cloud-dataflow==2.2.0'
            ]

setup(
    name="dataflow_pipeline_dependencies",
    version="1.0.0",
    description="Beam pipeline for flattening ism data",
    packages=find_packages(),
    install_requires=requires
)

Swapnil Masurekar Swapnil Masurekar · Accepted Answer · 2019-10-11T16:50:32

Include 'workflow' package in the setup.py required packages. Error is solved after including it.

from setuptools import setup, find_packages


requires = [
            'numpy==1.14.0',
            'google-cloud-storage==1.7.0',
            'pandas==0.22.0',
            'sqlalchemy-vertica[pyodbc,turbodbc,vertica-python]==0.2.5',
            'sqlalchemy==1.2.2',
            'apache_beam[gcp]==2.2.0',
            'google-cloud-dataflow==2.2.0',
            'workflow' # Include this line
            ]

setup(
    name="dataflow_pipeline_dependencies",
    version="1.0.0",
    description="Beam pipeline for flattening ism data",
    packages=find_packages(),
    install_requires=requires
)

Cloud Dataflow Python: Failed to install packages: failed to install workflow

4 Answers