0
votes

I am having trouble creating a dataflowRunner job that connects a pub/sub source to a big query sink, by plugging these two:

apache_beam.io.gcp.pubsub.PubSubSource
apache_beam.io.gcp.bigquery.BigQuerySink

into lines 59 and 74 respectively in the beam/sdks/python/apache_beam/examples/streaming_wordcount.py (https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/streaming_wordcount.py) example on github. After removing lines 61-70, and specifying the correct pub/sub and bigquery arguments, the script runs without errors without building the pipeline.

sidenote: the script mentions streaming pipeline support isnt available for use in Python. However, on the beam docs it mentions apache_beam.io.gcp.pubsub.PubSubSource is only available for streaming (1st sentence underneath the "apache_beam.io.gcp.pubsub module" heading: https://beam.apache.org/documentation/sdks/pydoc/2.0.0/apache_beam.io.gcp.html#module-apache_beam.io.gcp.pubsub)

1

1 Answers

4
votes

You can't stream on Python Dataflow - for now.

Monitor this changelog to find out the day it does:

(soon!)