106
votes

I just came across a question when using a List and its stream() method. While I know how to use them, I'm not quite sure about when to use them.

For example, I have a list, containing various paths to different locations. Now, I'd like to check whether a single, given path contains any of the paths specified in the list. I'd like to return a boolean based on whether or not the condition was met.

This of course, is not a hard task per se. But I wonder whether I should use streams, or a for(-each) loop.

The List

private static final List<String> EXCLUDE_PATHS = Arrays.asList(
    "my/path/one",
    "my/path/two"
);

Example using Stream:

private boolean isExcluded(String path) {
    return EXCLUDE_PATHS.stream()
                        .map(String::toLowerCase)
                        .filter(path::contains)
                        .collect(Collectors.toList())
                        .size() > 0;
}

Example using for-each loop:

private boolean isExcluded(String path){
    for (String excludePath : EXCLUDE_PATHS) {
        if (path.contains(excludePath.toLowerCase())) {
            return true;
        }
    }
    return false;
}

Note that the path parameter is always lowercase.

My first guess is that the for-each approach is faster, because the loop would return immediately, if the condition is met. Whereas the stream would still loop over all list entries in order to complete filtering.

Is my assumption correct? If so, why (or rather when) would I use stream() then?

5
Streams are more expressive and readable than traditional for-loops. In the later you need to be careful about intrinsics of if-then and conditions, etc. The stream expression is very clear: convert filenames to lower cases, then filter by something and then count, collect, etc. the result: a very iterative expression of the flow of computations.Jean-Baptiste Yunès
There is no need for new String[]{…} here. Just use Arrays.asList("my/path/one", "my/path/two")Holger
If your source is a String[], there is no need to call Arrays.asList. You can just stream over the array using Arrays.stream(array). By the way, I have difficulties understanding the purpose of the isExcluded test altogether. Is it really interesting whether an element of EXCLUDE_PATHS is literally contained somewhere within path? I.e. isExcluded("my/path/one/foo/bar/baz") will return true, as well as isExcluded("foo/bar/baz/my/path/one/")Holger
Great, I wasn't aware of the Arrays.stream method, thanks for pointing that out. Indeed, the example I posted seems quite useless for anyone else besides me. I'm aware of the behaviour of the isExcluded method, but it's really just something I need for myself, thus, to answer your question: yes, it is interesting for reasons I would like not to mention, as it would not fit into the scope of the original question.mcuenez
On our team, we find that the stream usage causes more issues and more bugs than for loops etc. This is becase not all of the team are stream experts, and stream code is more cryptic - you cant guess what is happening if you are not a stream expert, but anyone can read a for loop and if statements. So in our team, we favor longer explicit code than single lines of code which have a lot of functionality but no one really knows if/how it works and if there are performance issues. Other teams may prefer shortness of code.John Little

5 Answers

85
votes

Your assumption is correct. Your stream implementation is slower than the for-loop.

This stream usage should be as fast as the for-loop though:

EXCLUDE_PATHS.stream()  
    .map(String::toLowerCase)
    .anyMatch(path::contains);

This iterates through the items, applying String::toLowerCase and the filter to the items one-by-one and terminating at the first item that matches.

Both collect() & anyMatch() are terminal operations. anyMatch() exits at the first found item, though, while collect() requires all items to be processed.

34
votes

The decision whether to use Streams or not should not be driven by performance consideration, but rather by readability. When it really comes to performance, there are other considerations.

With your .filter(path::contains).collect(Collectors.toList()).size() > 0 approach, you are processing all elements and collecting them into a temporary List, before comparing the size, still, this hardly ever matters for a Stream consisting of two elements.

Using .map(String::toLowerCase).anyMatch(path::contains) can save CPU cycles and memory, if you have a substantially larger number of elements. Still, this converts each String to its lowercase representation, until a match is found. Obviously, there is a point in using

private static final List<String> EXCLUDE_PATHS =
    Stream.of("my/path/one", "my/path/two").map(String::toLowerCase)
          .collect(Collectors.toList());

private boolean isExcluded(String path) {
    return EXCLUDE_PATHS.stream().anyMatch(path::contains);
}

instead. So you don’t have to repeat the conversion to lowcase in every invocation of isExcluded. If the number of elements in EXCLUDE_PATHS or the lengths of the strings becomes really large, you may consider using

private static final List<Predicate<String>> EXCLUDE_PATHS =
    Stream.of("my/path/one", "my/path/two").map(String::toLowerCase)
          .map(s -> Pattern.compile(s, Pattern.LITERAL).asPredicate())
          .collect(Collectors.toList());

private boolean isExcluded(String path){
    return EXCLUDE_PATHS.stream().anyMatch(p -> p.test(path));
}

Compiling a string as regex pattern with the LITERAL flag, makes it behave just like ordinary string operations, but allows the engine to spent some time in preparation, e.g. using the Boyer Moore algorithm, to be more efficient when it comes to the actual comparison.

Of course, this only pays off if there are enough subsequent tests to compensate the time spent in preparation. Determining whether this will be the case, is one of the actual performance considerations, besides the first question whether this operation will ever be performance critical at all. Not the question whether to use Streams or for loops.

By the way, the code examples above keep the logic of your original code, which looks questionable to me. Your isExcluded method returns true, if the specified path contains any of the elements in list, so it returns true for /some/prefix/to/my/path/one, as well as my/path/one/and/some/suffix or even /some/prefix/to/my/path/one/and/some/suffix.

Even dummy/path/onerous is considered fulfilling the criteria as it contains the string my/path/one

21
votes

Yeah. You are right. Your stream approach will have some overhead. But you may use such a construction:

private boolean isExcluded(String path) {
    return  EXCLUDE_PATHS.stream().map(String::toLowerCase).anyMatch(path::contains);
}

The main reason to use streams is that they make your code simpler and easy to read.

9
votes

The goal of streams in Java is to simplify the complexity of writing parallel code. It's inspired by functional programming. The serial stream is just to make the code cleaner.

If we want performance we should use parallelStream, which was designed to. The serial one, in general, is slower.

There is a good article to read about ForLoop, Stream and ParallelStream Performance.

In your code we can use termination methods to stop the search on the first match. (anyMatch...)

1
votes

As others have mentioned many good points, but I just want to mention lazy evaluation in stream evaluation. When we do map() to create a stream of lower case paths, we are not creating the whole stream immediately, instead the stream is lazily constructed, which is why the performance should be equivalent to the traditional for loop. It is not doing a full scanning, map() and anyMatch() are executed at the same time. Once anyMatch() returns true, it will be short-circuited.