0
votes

Is Google Cloud Dataflow smart enough to take advantage of multi-core processors automatically?

I.E. If I have a ParDo which only uses single core, and I am only using a single worker, but I have passed --workerMachineType=n1-standard-2, will Dataflow run two parallel ParDo instances?

1

1 Answers

0
votes

Yes Dataflow will run multithreaded and run multiple ParDo instances on the same worker.

However, keep in mind that if you use a GroupByKey, then the ParDo will process elements for a particular key serially. Though you still achieve parallelism on the worker since you are processing multiple keys at once. However, if all of your data is on a single "hot key" you may not achieve good parallelism.