1
votes

Just wanted to know do we have more pipeline I/O and runtime parameters available with new version (3.X) of python. If I am correct then currently apache beam provide only File-based IOs: textio, avroio, tfrecordio when using python. But with Java we have more options available like File-based IOs, BigQueryIO, BigtableIO, PubSubIO and SpannerIO.

In my requirement I want to use BigQueryIO in GCP dataflow pipeline using python 3.X, But currently it is not available. Does anyone have some update on ETA when will it be available by apache beam.

2

2 Answers

3
votes

The BigTable Connector for Python 3 is under development for some time now. Currently, there is no ETA but you can follow the relevant Pull-Request from the official Apache Beam repository for further updates.

0
votes

BigQueryIO has been available for quite some time in the Apache Beam Python SDK.

There is also a Pub/Sub IO available as well as BigTable (write). SpannerIO is being worked on as we speak.

This page has more detail https://beam.apache.org/documentation/io/built-in/

UPDATE:

In line with OP giving more details, it turns out that indeed using value providers in the BigQuery query string was not supported.

This has been remedied in the following PR: https://github.com/apache/beam/pull/11040 and will most likely be part of the 2.21.0 release.

UPDATE 2: This new feature has been added in the 2.20.0 release of Apache Beam https://beam.apache.org/blog/2020/04/15/beam-2.20.0.html

Hope it solves your problem!