1
votes

Our python Dataflow pipeline works locally but not when deployed using the Dataflow managed service on Google Cloud Platform. It doesn't show signs that it is connected to the PubSub subscription. We have tried subscribing to both subscription and topic, neither of them worked. The messages accumulate in the PubSub subscription and the Dataflow pipeline doesn't show signs of being called or anything. We have double-checked the project is the same

Any directions on this would be very much appreciated

Here is the code to connect to a pull subscription

with beam.Pipeline(options=options) as p:
        something = p | "ReadPubSub" >> beam.io.ReadFromPubSub(
            subscription="projects/PROJECT_ID/subscriptions/cloudflow"
        )

Here goes the options used

 options = PipelineOptions()
 file_processing_options = PipelineOptions().view_as(FileProcessingOptions)
 if options.view_as(GoogleCloudOptions).project is None:
        print(sys.argv[0] + ": error: argument --project is required")
        sys.exit(1)
 options.view_as(SetupOptions).save_main_session = True
 options.view_as(StandardOptions).streaming = True

The PubSub subscription has this configuration:

Delivery type: Pull
Subscription expiration: Subscription expires in 31 days if there is no activity.
Acknowledgement deadline: 57 Seconds
Subscription filter: —
Message retention duration: 7 Days
Retained acknowledged messages: No
Dead lettering: Disabled
Retry policy : Retry immediately
1
Can you share the part of the pipeline where you connect to the PubSub and the options of the pipeline? Can you detail the type subscription that you use for Dataflow? have you double check that is the same project?guillaume blaquiere
Updated post to provide for that informationGRT
There is nothing strange. What's the version of Beam sdk that you use in your dependency?guillaume blaquiere
The version is 2.20.0GRT
Do you use special service account on your pipeline? In any case, does the used service account have access to the pubsub?guillaume blaquiere

1 Answers

1
votes

I think for Pulling from subscription we need to pass with_attributes parameter as True.

with_attributes – True - output elements will be PubsubMessage objects. False - output elements will be of type bytes (message data only).

Found similar one here: When using Beam IO ReadFromPubSub module, can you pull messages with attributes in Python? It's unclear if its supported