2
votes

I want to test how much faster Java8 parallel stream so I wrote a program .The program count the number of prime numbers in a given list of numbers. The program counts prime numbers in there ways:

  1. using for loop;
  2. by using lambda expression;
  3. by using lambda expression(parallel stream).

Before executing the program I was expecting that parallel stream version should be faster. But the result is

Total prime no found 664579 in 4237 mili sec ----for loop version
Total prime no found 664579 in 2440 mili sec ----parallel stream
Total prime no found 664579 in 2166 mili sec ----lambda expression

My doubt is why parallel version is slower then lambda version

List<Integer> numbers = new ArrayList<>();
        for (int i = 0; i < 10000000; i++) {
            numbers.add(i);
        }
        Stopwatch stopwatch = Stopwatch.createStarted();
        int counter = 0;
        for (int number : numbers) {
            if (isPrime(number)) {
                counter++;
            }
        }
        stopwatch.stop();
        System.out.println("Total prime no found " + counter + " in " + stopwatch.elapsed(TimeUnit.MILLISECONDS) + " mili sec");

        stopwatch = Stopwatch.createStarted();
        long count1 = numbers.parallelStream().filter(n -> isPrime(n)).count();
        stopwatch.stop();
        System.out.println("Total prime no found " + count1 + " in " + stopwatch.elapsed(TimeUnit.MILLISECONDS) + " mili sec");

        stopwatch = Stopwatch.createStarted();
        long count2 = numbers.stream().filter(n -> isPrime(n)).count();
        System.out.println("Total prime no found " + count2 + " in " + stopwatch.elapsed(TimeUnit.MILLISECONDS) + " mili sec");
        stopwatch.stop();

The above program uses google Guava library to calculate time elapsed.

2
while doing benchmarking you should also take care of warmup periodjmj
Also: count1 and count2 are identical...Brian S
I changed it but the results a inconsistentAshok_Pradhan
Total prime no found 664579 in 4178 mili sec --loop<br/> Total prime no found 664579 in 2597 mili sec --parallel stream<br/> Total prime no found 664579 in 4187 mili sec --lambda expression<br/> Total prime no found 664579 in 4198 mili sec --loop<br/> Total prime no found 664579 in 4690 mili sec --in parallel stream<br/> Total prime no found 664579 in 4195 mili sec --in lambda expression<br/> Total prime no found 664579 in 4252 mili sec --loop<br/> Total prime no found 664579 in 2455 mili sec --parallel<br/> Total prime no found 664579 in 4555 mili sec --lambda expression<br/>Ashok_Pradhan
Your benchmarking methodology is a bit simplistic. You should use a framework such as jmh.assylias

2 Answers

4
votes

Most likely, the problem is that during your test the JIT compiler (re-)compiles code. That means that your comparison isn't fair, because the later tests benefit from compilations caused by the earlier tests.

This is a frequently made mistake of micro benchmarks. There are a lot of articles explaining the problem, e.g. Robust Java benchmarking. If I add some code to warmup the JIT first, the results are expected. My main method looks as follows:

public static void main(String... args) {
    System.out.println("Warmup...");
    for (int i = 0; i < 5000; ++i) {
        run(Demo::testLoop, 5000);
        run(Demo::testStream, 5000);
        run(Demo::testParallel, 5000);
    }
    System.out.println("Benchmark...");
    int bound = 10000000;
    System.out.printf("Loop:     %s\n", run(Demo::testLoop, bound));
    System.out.printf("Stream:   %s\n", run(Demo::testStream, bound));
    System.out.printf("Parallel: %s\n", run(Demo::testParallel, bound));
}

And the output looks as follows:

Loop:     7.06s (664579)
Stream:   7.06s (664579)
Parallel: 3.84s (664579)

If you pass the option -XX:+PrintCompilation to the Java VM, you can see when and where the JIT kicks in, and that almost all compilations now happen during the warmup phase.

Note that parallel streams are not the best solution for this kind of parallelization, because the complexity of isPrime depends on the value. That means, the first half of the sequence requires significantly less work than the second half (and so on).

For reference, here are the remaining methods of my implementation:

public static boolean isPrime(int value) {
    for (int i = 2; i * i <= value; ++i)
        if (value % i == 0) return false;
    return true;
}

public static long testLoop(int bound) {
    long count = 0;
    for (int i = 2; i < bound; ++i)
        if (isPrime(i)) ++count;
    return count;
}

public static long testStream(int bound) {
    return IntStream.range(2, bound).filter(Demo::isPrime).count();
}

public static long testParallel(int bound) {
    return IntStream.range(2, bound).parallel().filter(Demo::isPrime).count();
}

public static String run(IntToLongFunction operation, int bound) {
    long start = System.currentTimeMillis();
    long count = operation.applyAsLong(bound);
    long millis = System.currentTimeMillis() - start;
    return String.format("%4.2fs (%d)", millis / 1000.0, count);
}
0
votes

It is a well known fact that the F/J framework needs to warm up. The code is written in such a fashion that it only performs well when compiled. You also have to consider the time it takes to create threads. Having a warm up period in a production environment is subjective.

There is a lot of awful code in the framework to overcome this sluggish behavior when first starting.