I've installed apache_beam Python SDK and apache airflow Python SDK in a Docker.
Python Version: 3.5
Apache Airflow: 1.10.5
I'm trying to execute apache-beam pipeline using **DataflowPythonOperator**
.
When I run a DAG from airflow UI at that time I get
Import Error: import apache_beam as beam. Module not found
With the same setup I tried **DataflowTemplateOperator**
and it's working perfectly fine.
When I tried same docker setup with Python 2 and apache airflow 1.10.3, two months back at that time operator didn't returned any error and was working as expected.
After SSH into docker when I checked the installed libraries (using pip freeze) in a docker container I can see the installed versions of apache-beam and apache-airflow. apache-airflow==1.10.5 apache-beam==2.15.0
Dockerfile:
RUN pip install --upgrade pip
RUN pip install --upgrade setuptools
RUN pip install apache-beam
RUN pip install apache-beam[gcp]
RUN pip install google-api-python-client
ADD . /home/beam
RUN pip install apache-airflow[gcp_api]
airflow operator:
new_task = DataFlowPythonOperator(
task_id='process_details',
py_file="path/to/file/filename.py",
gcp_conn_id='google_cloud_default',
dataflow_default_options={
'project': 'xxxxx',
'runner': 'DataflowRunner',
'job_name': "process_details",
'temp_location': 'GCS/path/to/temp',
'staging_location': 'GCS/path/to/staging',
'input_bucket': 'bucket_name',
'input_path': 'GCS/path/to/bucket',
'input-files': 'GCS/path/to/file.csv'
},
dag=test_dag)