I have a heavily I/O bound (Java) beam pipeline that on Google Cloud Dataflow I use the dataflow beam option "setNumberOfWorkerHarnessThreads(16);" to get 16 threads running on every virtual CPU. I'm trying to port that same pipeline to run on Spark, and I can't find an equivalent option on Spark. I've tried doing my own threading but that appears to be causing problems on the SparkRunner since the ProcessElement part of the DoFn returns but the output to the ProcessContext gets called later when the thread completes. (I get weird ConcurrentModificationExceptions with stack traces that are part of beam rather than in user code)
Is there an equivalent to that setting on Spark?