34
votes

Not sure how to call it, but say you have a class that looks like this:

class Person
{
    public string Name;
    public IEnumerable<Person> Friends;
}

You then have a person and you want to "unroll" this structure recursively so you end up with a single list of all people without duplicates.

How would you do this? I have already made something that seems to be working, but I am curious to see how others would do it and especially if there is something built-in to Linq you can use in a clever way to solve this little problem :)


Here is my solution:

public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> subjects, Func<T, IEnumerable<T>> selector)
{
    // Stop if subjects are null or empty
    if(subjects == null)
        yield break;

    // For each subject
    foreach(var subject in subjects)
    {
        // Yield it
        yield return subject;

        // Then yield all its decendants
        foreach (var decendant in SelectRecursive(selector(subject), selector))
            yield return decendant;
    }
}

Would be used something like this:

var people = somePerson.SelectRecursive(x => x.Friends);
6
I'm missing something... if you have loops there, will it ever stop?Kobi
@Kobi: This is done by if(!subjects.Any()) yield break;Oliver
@Oliver: No it won't. That will only stop if the subjects list is empty. So I guess I could actually have skipped that part totally since it won't make any difference...Svish
@Kobi: Nope, you are not missing anything. It will never stop :p The stuff I was working with when I made it would never have any cycles so didn't bother doing anything about it. If I needed to, I would probably use a HashSet to keep track of the subjects I had already visited.Svish
Removed the !subjects.Any() part, as it wasn't really do any good and just confuses :pSvish

6 Answers

41
votes

I don't believe there's anything built into LINQ to do this.

There's a problem with doing it recursively like this - you end up creating a large number of iterators. This can be quite inefficient if the tree is deep. Wes Dyer and Eric Lippert have both blogged about this.

You can remove this inefficiency by removing the direct recursion. For example:

public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> subjects,
    Func<T, IEnumerable<T>> selector)
{
    if (subjects == null)
    {
        yield break;
    }

    Queue<T> stillToProcess = new Queue<T>(subjects);

    while (stillToProcess.Count > 0)
    {
        T item = stillToProcess.Dequeue();
        yield return item;
        foreach (T child in selector(item))
        {
            stillToProcess.Enqueue(child);
        }
    }
}

This will also change the iteration order - it becomes breadth-first instead of depth-first; rewriting it to still be depth-first is tricky. I've also changed it to not use Any() - this revised version won't evaluate any sequence more than once, which can be handy in some scenarios. This does have one problem, mind you - it will take more memory, due to the queuing. We could probably alleviate this by storing a queue of iterators instead of items, but I'm not sure offhand... it would certainly be more complicated.

One point to note (also noted by ChrisW while I was looking up the blog posts :) - if you have any cycles in your friends list (i.e. if A has B, and B has A) then you'll recurse forever.

13
votes

I found this question as I was looking for and thinking about a similar solution - in my case creating an efficient IEnumerable<Control> for ASP.NET UI controls. The recursive yield I had is fast but I knew that could have extra cost, since the deeper the control structure the longer it could take. Now I know this is O(n log n).

The solution given here provides some answer but, as discussed in the comments, it does change the order (which the OP did not care about). I realized that to preserve the order as given by the OP and as I needed, neither a simple Queue (as Jon used) nor Stack would work since all the parent objects would be yielded first and then any children after them (or vice-versa).

To resolve this and preserve the order I realized the solution would simply be to put the Enumerator itself on a Stack. To use the OPs original question it would look like this:

public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> subjects, Func<T, IEnumerable<T>> selector)
{
    if (subjects == null)
        yield break;

    var stack = new Stack<IEnumerator<T>>();

    stack.Push(subjects.GetEnumerator());

    while (stack.Count > 0)
    {
        var en = stack.Peek();
        if (en.MoveNext())
        {
            var subject = en.Current;
            yield return subject;

            stack.Push(selector(subject).GetEnumerator());
        }
        else 
        {
            stack.Pop().Dispose();
        }
    }
}

I use stack.Peek here to keep from having to push the same enumerator back on to the stack as this is likely to be the more frequent operation, expecting that enumerator to provide more than one item.

This creates the same number of enumerators as in the recursive version but will likely be fewer new objects than putting all the subjects in a queue or stack and continuing to add any descendant subjects. This is O(n) time as each enumerator stands on its own (in the recursive version an implicit call to one MoveNext executes MoveNext on the child enumerators to the current depth in the recursion stack).

2
votes

You could use a non-recursive method like this as well:

  HashSet<Person> GatherAll (Person p) {
     Stack<Person> todo = new Stack<Person> ();
     HashSet<Person> results = new HashSet<Person> ();
     todo.Add (p); results.Add (p);
     while (todo.Count > 0) {
        Person p = todo.Pop (); 
        foreach (Person f in p.Friends) 
           if (results.Add (f)) todo.Add (f);
     }
     return results;
  }

This should handle cycles properly as well. I am starting with a single person, but you could easily expand this to start with a list of persons.

2
votes

Here's an implementation that:

  • Does a depth first recursive select,
  • Doesn't require double iteration of the child collections,
  • Doesn't use intermediate collections for the selected elements,
  • Doesn't handle cycles,
  • Can do it backwards.

    public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> rootItems, Func<T, IEnumerable<T>> selector)
    {
        return new RecursiveEnumerable<T>(rootItems, selector, false);
    }
    
    public static IEnumerable<T> SelectRecursiveReverse<T>(this IEnumerable<T> rootItems, Func<T, IEnumerable<T>> selector)
    {
        return new RecursiveEnumerable<T>(rootItems, selector, true);
    }
    
    class RecursiveEnumerable<T> : IEnumerable<T>
    {
        public RecursiveEnumerable(IEnumerable<T> rootItems, Func<T, IEnumerable<T>> selector, bool reverse)
        {
            _rootItems = rootItems;
            _selector = selector;
            _reverse = reverse;
        }
    
        IEnumerable<T> _rootItems;
        Func<T, IEnumerable<T>> _selector;
        bool _reverse;
    
        public IEnumerator<T> GetEnumerator()
        {
            return new Enumerator(this);
        }
    
        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    
        class Enumerator : IEnumerator<T>
        {
            public Enumerator(RecursiveEnumerable<T> owner)
            {
                _owner = owner;
                Reset();
            }
    
            RecursiveEnumerable<T> _owner;
            T _current;
            Stack<IEnumerator<T>> _stack = new Stack<IEnumerator<T>>();
    
    
            public T Current
            {
                get 
                {
                    if (_stack == null || _stack.Count == 0)
                        throw new InvalidOperationException();
                    return _current; 
                }
            }
    
            public void Dispose()
            {
                _current = default(T);
                if (_stack != null)
                {
                    while (_stack.Count > 0)
                    {
                        _stack.Pop().Dispose();
                    }
                    _stack = null;
                }
            }
    
            object System.Collections.IEnumerator.Current
            {
                get { return Current; }
            }
    
            public bool MoveNext()
            {
                if (_owner._reverse)
                    return MoveReverse();
                else
                    return MoveForward();
            }
    
            public bool MoveForward()
            {
                // First time?
                if (_stack == null)
                {
                    // Setup stack
                    _stack = new Stack<IEnumerator<T>>();
    
                    // Start with the root items
                    _stack.Push(_owner._rootItems.GetEnumerator());
                }
    
                // Process enumerators on the stack
                while (_stack.Count > 0)
                {
                    // Get the current one
                    var se = _stack.Peek();
    
                    // Next please...
                    if (se.MoveNext())
                    {
                        // Store it
                        _current = se.Current;
    
                        // Get child items
                        var childItems = _owner._selector(_current);
                        if (childItems != null)
                        {
                            _stack.Push(childItems.GetEnumerator());
                        }
    
                        return true;
                    }
    
                    // Finished with the enumerator
                    se.Dispose();
                    _stack.Pop();
                }
    
                // Finished!
                return false;
            }
    
            public bool MoveReverse()
            {
                // First time?
                if (_stack == null)
                {
                    // Setup stack
                    _stack = new Stack<IEnumerator<T>>();
    
                    // Start with the root items
                    _stack.Push(_owner._rootItems.Reverse().GetEnumerator());
                }
    
                // Process enumerators on the stack
                while (_stack.Count > 0)
                {
                    // Get the current one
                    var se = _stack.Peek();
    
                    // Next please...
                    if (se.MoveNext())
                    {
                        // Get child items
                        var childItems = _owner._selector(se.Current);
                        if (childItems != null)
                        {
                            _stack.Push(childItems.Reverse().GetEnumerator());
                            continue;
                        }
    
                        // Store it
                        _current = se.Current;
                        return true;
                    }
    
                    // Finished with the enumerator
                    se.Dispose();
                    _stack.Pop();
    
                    if (_stack.Count > 0)
                    {
                        _current = _stack.Peek().Current;
                        return true;
                    }
                }
    
                // Finished!
                return false;
            }
    
            public void Reset()
            {
                Dispose();
            }
        }
    }
    
1
votes

use the Aggregate extension...

    List<Person> persons = GetPersons(); 
    List<Person> result = new List<Person>(); 
    persons.Aggregate(result,SomeFunc);

    private static List<Person> SomeFunc(List<Person> arg1,Person arg2)
    {
    arg1.Add(arg2)
    arg1.AddRange(arg2.Persons);
    return arg1;
    }
0
votes

Recursion is always fun. Perhaps you could simplify your code to:

public static IEnumerable<T> SelectRecursive<T>(this IEnumerable<T> subjects, Func<T, IEnumerable<T>> selector) {
    // Stop if subjects are null or empty 
    if (subjects == null || !subjects.Any())
        return Enumerable.Empty<T>();

    // Gather a list of all (selected) child elements of all subjects
    var subjectChildren = subjects.SelectMany(selector);

    // Jump into the recursion for each of the child elements
    var recursiveChildren = SelectRecursive(subjectChildren, selector);

    // Combine the subjects with all of their (recursive child elements).
    // The union will remove any direct parent-child duplicates.
    // Endless loops due to circular references are however still possible.
    return subjects.Union(recursiveChildren);
}

It will result in less duplicates than your original code. However their might still be duplicates causing an endless loop, the union will only prevent direct parent(s)-child(s) duplicates.

And the order of the items will be different from yours :)

Edit: Changed the last line of code to three statements and added a bit more documentation.