1
votes

When creating a new Dataflow Pub/Sub to BigQuery template it is possible to specify the PubSub topic. It appears there is no way to provide the existing PubSub subscription, instead Dataflow template creates a new subscription each time it runs.

As far as I understand the PubSub model, the only way to make sure we continue reading the data from the same place in topic is to reuse the same subscription and there seem to be no such option in here.

What will happen when user wants to re-deploy such a Dataflow template? Are we going to lose all the data between the deployments?

1

1 Answers

2
votes

You're right, the google-provided Pub/Sub to BigQuery template does not support passing a subscription as a parameter (here's an older answer by a googler confirming this). However, it should be easy to edit it so that it does so. You would only need to replace getInputTopic with a getSubscription equivalent. In turn, this should be passed to a PubsubIO.readMessagesWithAttributes().fromSubscription (options.getSubscription()) method (see here) instead of fromTopic. After creating your new pipeline, you'd need to create and stage your template.