1
votes

I have a two Google Pub/Sub topics my pipelines are streaming from with the Windowing. Technically I have two pipelines individually for each of the topics and I need to merge these two pipeline Windows to a single one to do some aggregation which requires combined events within the same Window.

Say we have Event1 and Event2. These two events have two separate topics say Topic1 and Topic2. I have Pipeline1 and Pipeline2 which individually streams from those topics. I need to somehow get access to Event1 and Event2 which fall within the same Window and produce some output. Is this possible?

1

1 Answers

3
votes

You can read from multiple Pubsub topics in the same pipeline like so:

Pipeline p = ...;

PCollection<A> collection1 = p.apply(PubsubIO.Read.topic(topic1));
PCollection<B> collection2 = p.apply(PubsubIO.Read.topic(topic2));

Now, how you want to combine these two PCollections depends on your application. You will probably want to read Handling Multiple PCollections. Here is a quick mention of three possibilities:

  1. Flatten: if you just want to merge the contents of the two collections on a per-window basis, this will do it.

  2. ParDo with side inputs: if windows of one collection are fairly small, then reading this as a side input of a ParDo over the larger collection may be reasonable.

  3. Joins with CoGroupByKey: you can implement many sorts of joins between the two collections by keying them on some common key and using CoGroupByKey.