4
votes

I'm using Java's fork-join framework to deal with a CPU- intensive calculation.

I've tweaked the "sequential threshold" (used to determine whether to create subtasks or do the work) a bit, but to my disappointment, going from single-threaded to 4+4 cores only about doubles the overall performance. The pool does report 8 CPUs, and when I manually set 2,3,4,.. I see gradual increases in performance, but still it tops out at about twice the single- thread throughput overall. Also, the Linux System Activity monitor hovers around 50% for that Java process.

Also very suspicious is the fact that when I start multiple Java processes, the collective throughput is more in line (almost 4 times faster than a single thread) and the System Activity monitor shows higher CPU use.

Is is possible that there is a limitation in either Java, Linux, or the fork/join framework that would disallow full CPU usage? Any suggestions or similar experiences?

NB. This is on an Intel 3770 CPU, with 4 cores and 4 hyperthreaded cores, running Oracle Java 7r13 on a Linux Mint box.

1
To understand what's happening on your fork-join setup, you need to figure out where the bottleneck is. Beyond that, it's really hard for us to make specific suggestions based just on the information in your question. - NPE
Interesting situation. Regarding the paralelization speedup, this is the theoretical limit: Amdahl's law - linski
looks like a lot of blocking system calls that put the threads on wait, thereby lowering the CPU load reported by the kernel - Ralf H
@Ray you should look around for already written performance tests for FJ. Run it on your machine and if you see 100% utilization on your computer there is a good chance it is your code. - John Vint
Post a little code. what does your compute() look like. Just a little code, not every step. - edharned

1 Answers

1
votes

Thanks for the thoughts and answers, everyone! From your suggestions, I concluded that the problem was not the framework itself and went on to some more testing, finding that after a few minutes the cpu load dropped down to 15% !

Turns out, Random (which I use extensively) has poor performance in a multithreaded setup. The solution was to use ThreadLocalRandom.current().nextXXX() instead. I'm now up to consistent 80% usage (there are still some sequential passages left). Sweet!

Thanks again for putting me on the right track.