I am using Apache Beam to read data from Google Cloud Datastore with the help of Beam's own io.gcp.datastore.v1.datastoreio
Python APIs.
I run my pipeline on Google Cloud Dataflow.
I want to ensure that my workers are not overloaded with data.
How can I read data in batches or ensure using some other mechanism that my workers are not pulling a huge amount of data in one go?