I wanted to understand the working of Dataflow pipeline.
In my case, I have something published to cloud pub/sub periodically which Dataflow then writes to BigQuery. The volume of messages that come through are in the thousands so my publisher client has a batch setting for 1000 messages, 1 mb and 10 sec latency.
The question is that when published in a batch does Dataflow SQL takes in all the messages in the batch and writes it to BigQuery all in one go or it writes one message at a time?
And is there any benefit of one over the other?
Please comment if any other details required. Thanks