“ValueError: Cloud Pub/Sub is currently available for use only in streaming pipelines” while using beam.io.WriteToPubSub() in batch mode

Question

I am reading some customer records from lookup and writing it into bigquery table, Then from that same table I am reading some required data field and trying to push that data (Json) as a message into pubsub using dataflow pipeline in batch mode. But getting error : "ValueError: Cloud Pub/Sub is currently available for use only in streaming pipelines".

delete_rows = p | 'reading data to be deleted' >> beam.io.Read(
            beam.io.BigQuerySource(
                query=delete_query,
                use_standard_sql=True))

        required_data = delete_rows | 'Retriving only required data' >> beam.ParDo(RequiredData())

        push_to_pubsub = required_data | 'Pushing data to pubsub' >> beam.io.WriteToPubSub(
            topic='my topic name',
            with_attributes=False,
            id_label=None,
            timestamp_attribute=None
        )

I would like to use PubSub in batch mode of dataflow pipeline

chamikara chamikara · Accepted Answer · 2019-08-06T21:09:08

Thanks for trying this out. Cloud Pub/Sub for Dataflow Python SDK is currently implemented as a Dataflow native source that is only available for the Dataflow Python streaming backend. We can look into providing an implementation that works for batch pipelines in the future, but I don't have an ETA for this.

“ValueError: Cloud Pub/Sub is currently available for use only in streaming pipelines” while using beam.io.WriteToPubSub() in batch mode

1 Answers