2
votes

Is there a way to publish a message onto Google Pubsub after a Google Dataflow job completes? We have a need to notify dependent systems that the processing of incoming data is complete. How could Dataflow publish after writing data to the sink?

EDIT: We want to notify after a pipeline completes writing to GCS. Our pipeline looks like this:

 
Pipeline.create(options)
                .apply(....)
                .apply(AvroIO.Write.named("Write to GCS")
                             .withSchema(Extract.class)
                             .to(options.getOutputPath())
                             .withSuffix(".avro"));
p.run();

If we add logic outside of the pipeline.apply(...) methods we are notified when the code completes execution, not when the pipeline is completed. Ideally we could add another .apply(...) after the AvroIO sink and publish a message to PubSub.

1
There is nothing stopping you writing a message to a pub/sub topic when your pipeline(s) finishes. You don't need Dataflow to do this.Graham Polley
One may need to use the BlockingPipelineRunner( cloud.google.com/dataflow/pipelines/…) to achieve the desired effect.Tudor Marian
I have a similar use case, but my pipeline is running in streaming mode with an hourly interval. I want to publish a message to pubsub after each window's Write is compete.gilmatic

1 Answers

1
votes

You have two options to get notified when your pipeline finishes, and then subsequently publish a message - or do whatever you want to after the pipeline finishes running:

  1. Use the BlockingPipelineRunner. This will run your pipeline synchronously.
  2. Use the DataflowPipelineRunner. This will run your pipeline asynchronously. You can then poll the pipeline for its status, and wait for it to finish.