0
votes

I have a streaming apache beam pipeline that I want to deploy using DataflowRunner. My pipeline code looks something like this:

with beam.Pipeline(options=pipeline_options) as p:
    (p | "Read input from PubSub" >>
     beam.io.ReadFromPubSub(subscription=known_args.subscription)
     # ...

Then I deploy the pipeline like this python3 main.py --runner=DataflowRunner --streaming ... The pipeline is being deployed successfully but the problem is that the process does not end but continues to show logs coming from the workers.

Is there a way to start the pipeline, check that it's in a running state, and then exit the process?

1

1 Answers

1
votes

I believe when you are using `with beam.Pipeline() as p:' it will by default wait until the pipeline finishes or being terminated because it will invoke the 'enter' and 'exit' function on the pipeline, see https://github.com/apache/beam/blob/93c2bd8c8a7988f99a1299b9a1dd3a01122a35be/sdks/python/apache_beam/pipeline.py#L581.

Alternatively you can try

p = beam.Pipeline(options=pipeline_options)
p | ...
p.run()

Which should be non-blocking.