4
votes

I would like to route dynamically the different elements of a PCollection to different PubSub topics, based on the content of a field. The topics are not persistent but it is assumed that they exist when PubSubIO.Write() is executed at runtime. So Dataflow should only infer their names at runtime on a per-message basis.

The feature exists for BigQuery and dynamic table names : https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinations.html

Is there a way to do something similar with PubSubIO ?

Maybe not based on the message content but on an attribute ? https://beam.apache.org/documentation/sdks/javadoc/0.6.0/org/apache/beam/sdk/io/PubsubIO.PubsubMessage.html#getAttribute-java.lang.String-

1

1 Answers

2
votes

Is there a way to do something similar with PubSubIO ?

There is no equivalent to DynamicDestinations for Pub/Sub.

You'll need to know all of the Pub/Sub topics ahead of time and have them defined in the pipeline. The pipeline can be partitioned based on some value or attribute of the Pub/Sub message and routed to the appropriate Pub/Sub topic. The Partition transform will examine the PubsubMessage and determine which partition the message belongs.

Reference: Partition

Maybe not based on the message content but on an attribute ?

Yes, you can access a message's attributes.