I have a large batched parallel computation that I use a parallel map for in scala. I have noticed that there appears to be a gradual downstepping of CPU usage as the workers finish. It all comes down to a call to a call inside of the Map object
scala.collection.parallel.thresholdFromSize(length, tasksupport.parallelismLevel)
Looking at the code, I see this:
def thresholdFromSize(sz: Int, parallelismLevel: Int) = {
val p = parallelismLevel
if (p > 1) 1 + sz / (8 * p)
else sz
}
My calculation works great on a large number of cores, and now I understand why..
thesholdFromSize(1000000,24) = 5209
thesholdFromSize(1000000,4) = 31251
If I have an array of length 1000000 on 24 CPU's it will partition all the way down to 5209 elements. If I pass that same array into the parallel collections on my 4 CPU machine, it will stop partitioning at 31251 elements.
It should be noted that the runtime of my calculations is not uniform. Runtime per unit can be as much as 0.1 seconds. At 31251 items, that's 3100 seconds, or 52 minutes of time where the other workers could be stepping in and grabbing work, but are not. I have observed exactly this behavior while monitoring CPU utilization during the parallel computation. Obviously I'd love to run on a large machine, but that's not always possible.
My question is this: Is there any way to influence the parallel collections to give it a smaller threshold number that is more suited to my problem? The only thing I can think of is to make my own implementation of the class 'Map', but that seems like a very non-elegant solution.