0
votes

I have a problem with large number of parallel workers in apache beam with streaming job (dataflow backend, python SDK)

Initializing SDKHarness with unbounded number of workers.

Seems that beam produces several hundreds of DoFn instances in several seconds from the beginning from single VM/worker

And I cant find place in sourcecode, where i can limit this "unbounded" number.

I need to limit them, because in process() and in setup() i have external calls, and i need to decrease outgoing RPS.

1

1 Answers

2
votes

If you are using the runner v2, enabled via:

--experiments=use_runner_v2

You can make use of the following parameter for defining the number of threads per process :

--number_of_worker_harness_threads