I have a BigQuery dimension table (which doesn't change much) and a streaming JSON data from PubSub. What I want to do is to query this dimension table, and enrich the data by joining on the incoming data from PubSub, then write those streams of joined data to another BigQuery table.
As I am new to Dataflow/Beam and the concept is still not that clear to me (or at least I have difficulty starting to write the code), I have a number of questions:
- What is best template or pattern I can use to do that? Should I do a PTransform of BigQuery first (followed by PTransform of PubSub) or the PTransform of PubSub first?
- How can I do the join? Like
ParDo.of(...).withSideInputs(PCollectionView<Map<String, String>> map)
? - What is the best window setting for the PubSub? Is it correct that the window setting for the PTransform part of BigQuery is different from the PTransform part of the Pubsub one?