We have a streaming Dataflow pipeline running on Google Cloud Dataflow workers, which needs to read from a PubSub subscription, group messages, and write them to BigQuery. The built-in BigQuery Sink does not fit our needs as we need to target specific datasets and tables for each group. As the custom sinks are not supported for streaming pipelines, it seems like the only solution is to perform the insert operations in a ParDo. Something like this:
Is there any known issue with not having a sink in a pipeline, or anything to be aware of when writing this kind of pipeline?
side outputs
to write to N BigQuery sinks. Could this work for you too? – Graham Polley