2
votes

Consider I want to traverse some of the nodes of a tree-like structure using Stream API (similar questions: [1], [2], [3]). The first implementation coming to mind would be:

abstract class Node {
    abstract Collection<Node> getChildren();

    final Stream<Node> stream() {
        return Stream.concat(Stream.of(this), this.getChildren().stream().flatMap(Node::stream));
    }
}

The above stream() implementation has the following features:

  • It's "almost lazy", in a sense that only the immediate children of the root node will be retrieved during a stream creation (children of other internal nodes will be queried as necessary).
  • It exhibits a DFS traversal order.

Now consider I have a large hierarchy, getChildren() operation is costly, and I'm looking for any Node matching some predicate:

final Node tree = ...;
final Predicate<Node> p = ...;
tree.stream().filter(p).findAny();
  1. How do I make my stream() implementation "completely lazy"? I don't want children of a node queried if the node itself already matches the predicate. I'm looking for a lazy version of Stream.concat() with a signature of (Stream<T> a, Supplier<Stream<T>> b).
  2. How do I implement a BFS traversal order instead (using Stream API)?
3
Have you tried implementing a spliterator?the8472
@the8472 That's what I'm about to do. Just was looking for a simpler solution.Bass
Consider this question. In short, the implementation is feasible and it’s not the concat (alone) but the current implementation of flatMap that affects the laziness. You have to think carefully whether you want to fix what the JRE broke. If you want to proceed, have a look at this answerHolger

3 Answers

2
votes

Unfortunately, I can only answer your first question. Again, flatMap is your friend here. It enables us to create a slightly different concat method that accepts stream-Suppliers instead of just streams:

abstract class Node {
    abstract Collection<Node> getChildren();

    Stream<Node> lazyTraverse() {
        return Node.concat(() -> Stream.of(this),
                           () -> getChildren().stream().flatMap(Node::lazyTraverse));
    }

    static Stream<Node> concat(Supplier<Stream<Node>> a, Supplier<Stream<Node>> b) {
        return Stream.of(a, b).flatMap(Supplier::get);
    }
}

An even better solution would be if you could replace getChildren by some mechanism that returns a lazy Stream<Node> instead and can be used directly in your original tree traversal algorithm. A lazy stream is much more suitable than a slow getter.

Some words on your second question:

I don't know if there is BFS traversal algorithm using the streaming-API in an elegant way, but I tend to say 'no' as BFS usually requires additional memory to store all nodes that have been visited but not yet traversed.

1
votes

You can use the Spliterators library and the low level stream support primitives. You can then provide an Iterator over your nodes which does consume nodes only one by one.

return StreamSupport.stream( Spliterators.spliteratorUnknownSize( new Iterator<Node>()
{
    @Override
    public boolean hasNext()
    {
        // to implement
        return ...;
    }

    @Override
    public ContentVersion next()
    {
        // to implement
        return ...;
    }
}, 0 ), false );
1
votes

A bit ugly, but this ought to work for Java 8:

public static <N> Stream<N> breadthFirst(N start, Function<? super N, Stream<N>> getChildren) {
    final LinkedList<Stream<N>> generations = new LinkedList<>();
    generations.add(Stream.of(start));
    final Iterator<Stream<N>> genIterator = createIterator(generations::remove, () -> !generations.isEmpty());
    return StreamSupport.stream(Spliterators.spliteratorUnknownSize(genIterator, Spliterator.ORDERED), false)
            .flatMap(Function.identity())
            .distinct() // avoids loops
            .peek(n -> generations.add(getChildren.apply(n)));
}

public static <E> Iterator<E> createIterator(Supplier<E> supplier, BooleanSupplier hasNext) {
    return new Iterator<E>() {
        @Override
        public boolean hasNext() {
            return hasNext.getAsBoolean();
        }
        @Override
        public E next() {
            return supplier.get();
        }
    };
}

The idea here is that you need to hang onto references to subsequent generations, so we create a list to hold them while processing. With Java 9 you'll be able to replace the custom iterator code with Stream.generate(generations::poll).takeWhile(Objects::nonNull).