1
votes

I know Apache beam and I am able to create pipeline using it, I also know which operator in Cloud Composer to use to run dataflow job, I just want to know how to convert plain apache beam code into dataflow job so that I can run it using Cloud Composer, what setting will I need what config will I need, I did not find Google doc very useful, please help me. My requirement is to read csv file from cloud storage and load it into BigQuery using dataflow and then schedule it using Cloud Composer. I am using Python.

1

1 Answers

0
votes

Some tentatively useful GCP docs can be found here: https://cloud.google.com/composer/docs/how-to/using/using-dataflow-template-operator

But, in general, if you already have the Beam written (and it works), then you would want to specify the "Dataflow" runner.

For a 'custom' Dataflow job, you likely want the following Operator --> https://airflow.apache.org/docs/apache-airflow/1.10.6/_api/airflow/contrib/operators/dataflow_operator/index.html#airflow.contrib.operators.dataflow_operator.DataFlowPythonOperator

I'm sure you are aware that Cloud Composer is managed Airflow. So you can use 'regular' airflow operators.