Sequential stream is faster than parallel stream if number of iterations is increased

Question

I measure performance with the example code at the end.

If I call the checkPerformanceResult method with the parameter numberOfTimes set to 100 the parallel stream outperforms the sequential stream significant(sequential=346, parallel=78).

If I set the parameter to 1000, the sequential stream outperforms the parallel stream significant(sequential=3239, parallel=9337).

I did a lot of runs and the result is the same.

Can someone explain me this behaviour and what is going on under the hood here?

public class ParallelStreamExample {
    public static long checkPerformanceResult(Supplier<Integer> s, int numberOfTimes) {
        long startTime = System.currentTimeMillis();
        for(int i = 0; i < numberOfTimes; i++) {

           s.get();
        }
        long endTime = System.currentTimeMillis();
        return endTime - startTime;
    }

    public static int sumSequentialStreamThread() {
        IntStream.rangeClosed(1, 10000000).sum();
        return 0;
    }

    public static int sumParallelStreamThread() {
        IntStream.rangeClosed(1, 10000000)
                .parallel().sum();
        return 0;
    }

    public static void main(String[] args) {
        System.out.println(checkPerformanceResult(ParallelStreamExample::sumSequentialStreamThread, 1000));
        System.out.println("break");
        System.out.println(checkPerformanceResult(ParallelStreamExample::sumParallelStreamThread, 1000));
    }
}

Microbenchmarks are notoriously unreliable. You run each test once. Also, you return a constant zero. The correct algorithm is return ((n + 1) * n) / 2; - no loops or ranges required. — Elliott Frisch
Returning a constant shouldn't have any impact in this case? Besides I agree, it is confusing. Also I'm not interested in the fastest alogrithm to tackle a sum problem. I want to see the performance of the streams. — MaverinCode
I think parallel would be faster if there was a way to hint to the stream that the reduction operation could be done with divide and conquer approach. I think what it is currently doing is spawning N threads with each holding a single number, then the terminal operation has to still sequentially evaluate each thread's value and combine adjacent threads. Basically, you are creating a massive stream of workers, but forcing them all to pass through your sum operation sequentially. In other words, the parallel ops should be happening in the reducer (sum), not the producer (stream) — smac89
I suspect this is due (in part) to poor benchmark design / implementation. Re-do the benchmark using jmh (baeldung.com/java-microbenchmark-harness) or similar. There is not a lot of point in analyzing the results of a questionable benchmark. — Stephen C
@smac89 there is no problem in parallel processing of IntStream.rangeClosed(1, 10000000) .parallel() .sum(); There is no “producer-consumer” thing going on here. — Holger

amos guata amos guata · Accepted Answer · 2019-10-12T11:18:26

Using threads doesn't always make the code run faster

when working with a few threads there is always an overhead of managing each thread (assigning CPU time by to OS to each thread, managing the next line of code that needs to run in case of a context switch etc...)

In this specific case

each thread created in sumParallelStreamThread does very simple in memory operations (calling a function that returns a number).

so the difference between sumSequentialStreamThread and sumParallelStreamThread is that in sumParallelStreamThread each simple operation has the overhead of creating a thread and running it (assuming that there isn't any thread optimization happening in the background).

and sumSequentialStreamThread does the same thing without the overhead of managing all the threads, that's why it runs faster.

When to use a threads

The most common use case for working with threads is when you need to perform a bunch of I/O tasks.

what is considered an I/O task?

it depends on several factors, you can find a debate on it here. but i think generally people will agree that making and HTTP request to somewhere or executing a database query can be considered an I/O operation.

why is it more suitable?

because I/O operations usually have some period of time of waiting for a response involved with them. for example when querying a database the thread performing the query will wait for the database to return the response (even if its less than half a second) while this thread is waiting a different thread can perform other actions and that is where we can gain performance.

I find that usually running tasks that involve only RAM memory and CPU operations in different threads makes the code run slower than with one thread.

Benchmark discussion

regarding the benchmark remarks is see in the comments, i am not sure if they are correct or not, but in these type of situations i would double check my benchmark against any profiling tool (or just use it to begin with) like JProfiler or YoutKit they are usually very accurate.

Sequential stream is faster than parallel stream if number of iterations is increased

1 Answers