The Javadocs for Collection.(parallelS|s)tream()
and Stream
itself don't answer the question, so it's off to the mailing lists for the rationale. I went through the lambda-libs-spec-observers archives and found one thread specifically about Collection.parallelStream() and another thread that touched on whether java.util.Arrays should provide parallelStream() to match (or actually, whether it should be removed). There was no once-and-for-all conclusion, so perhaps I've missed something from another list or the matter was settled in private discussion. (Perhaps Brian Goetz, one of the principals of this discussion, can fill in anything missing.)
The participants made their points well, so this answer is mostly just an organization of the relevant quotes, with a few clarifications in [brackets], presented in order of importance (as I interpret it).
parallelStream() covers a very common case
Brian Goetz in the first thread, explaining why Collections.parallelStream()
is valuable enough to keep even after other parallel stream factory methods have been removed:
We do not have explicit parallel versions of each of these [stream factories]; we did
originally, and to prune down the API surface area, we cut them on the
theory that dropping 20+ methods from the API was worth the tradeoff of
the surface yuckiness and performance cost of .intRange(...).parallel()
.
But we did not make that choice with Collection.
We could either remove the Collection.parallelStream()
, or we could add
the parallel versions of all the generators, or we could do nothing and
leave it as is. I think all are justifiable on API design grounds.
I kind of like the status quo, despite its inconsistency. Instead of
having 2N stream construction methods, we have N+1 -- but that extra 1
covers a huge number of cases, because it is inherited by every
Collection. So I can justify to myself why having that extra 1 method
is worth it, and why accepting the inconsistency of going no further is
acceptable.
Do others disagree? Is N+1 [Collections.parallelStream() only] the practical choice here? Or should we go
for the purity of N [rely on Stream.parallel()]? Or the convenience and consistency of 2N [parallel versions of all factories]? Or is
there some even better N+3 [Collections.parallelStream() plus other special cases], for some other specially chosen cases we
want to give special support to?
Brian Goetz stands by this position in the later discussion about Arrays.parallelStream()
:
I still really like Collection.parallelStream; it has huge
discoverability advantages, and offers a pretty big return on API
surface area -- one more method, but provides value in a lot of places,
since Collection will be a really common case of a stream source.
parallelStream() is more performant
Brian Goetz:
Direct version [parallelStream()] is more performant, in that it requires less wrapping (to
turn a stream into a parallel stream, you have to first create the
sequential stream, then transfer ownership of its state into a new
Stream.)
In response to Kevin Bourrillion's skepticism about whether the effect is significant, Brian again:
Depends how seriously you are counting. Doug counts individual object
creations and virtual invocations on the way to a parallel operation,
because until you start forking, you're on the wrong side of Amdahl's
law -- this is all "serial fraction" that happens before you can fork
any work, which pushes your breakeven threshold further out. So getting
the setup path for parallel ops fast is valuable.
Doug Lea follows up, but hedges his position:
People dealing with parallel library support need some attitude
adjustment about such things. On a soon-to-be-typical machine,
every cycle you waste setting up parallelism costs you say 64 cycles.
You would probably have had a different reaction if it required 64
object creations to start a parallel computation.
That said, I'm always completely supportive of forcing implementors
to work harder for the sake of better APIs, so long as the
APIs do not rule out efficient implementation. So if killing
parallelStream
is really important, we'll find some way to
turn stream().parallel()
into a bit-flip or somesuch.
Indeed, the later discussion about Arrays.parallelStream()
takes notice of lower Stream.parallel() cost.
stream().parallel() statefulness complicates the future
At the time of the discussion, switching a stream from sequential to parallel and back could be interleaved with other stream operations. Brian Goetz, on behalf of Doug Lea, explains why sequential/parallel mode switching may complicate future development of the Java platform:
I'll take my best stab at explaining why: because it (like the stateful
methods (sort, distinct, limit)) which you also don't like, move us
incrementally farther from being able to express stream pipelines in
terms of traditional data-parallel constructs, which further constrains
our ability to to map them directly to tomorrow's computing substrate,
whether that be vector processors, FPGAs, GPUs, or whatever we cook up.
Filter-map-reduce map[s] very cleanly to all sorts of parallel computing
substrates; filter-parallel-map-sequential-sorted-limit-parallel-map-uniq-reduce
does not.
So the whole API design here embodies many tensions between making it
easy to express things the user is likely to want to express, and doing
is in a manner that we can predictably make fast with transparent cost
models.
This mode switching was removed after further discussion. In the current version of the library, a stream pipeline is either sequential or parallel; last call to sequential()
/parallel()
wins. Besides side-stepping the statefulness problem, this change also improved the performance of using parallel()
to set up a parallel pipeline from a sequential stream factory.
exposing parallelStream() as a first-class citizen improves programmer perception of the library, leading them to write better code
Brian Goetz again, in response to Tim Peierls's argument that Stream.parallel()
allows programmers to understand streams sequentially before going parallel:
I have a slightly different viewpoint about the value of this sequential
intuition -- I view the pervasive "sequential expectation" as one if the
biggest challenges of this entire effort; people are constantly
bringing their incorrect sequential bias, which leads them to do stupid
things like using a one-element array as a way to "trick" the "stupid"
compiler into letting them capture a mutable local, or using lambdas as
arguments to map that mutate state that will be used during the
computation (in a non-thread-safe way), and then, when its pointed out
that what they're doing, shrug it off and say "yeah, but I'm not doing
it in parallel."
We've made a lot of design tradeoffs to merge sequential and parallel
streams. The result, I believe, is a clean one and will add to the
library's chances of still being useful in 10+ years, but I don't
particularly like the idea of encouraging people to think this is a
sequential library with some parallel bags nailed on the side.