I would like to build the following pipeline:
pub/sub --> dataflow --> bigquery
The data is streaming, but I would like to avoid streaming the data directly into BigQuery, therefore I was hoping to batch up small chunks in the dataflow machine and then write them into BQ as a load job when they reach a certain size/time.
I cannot find any examples of how to do this using the python apache beam SDK - only Java.