5
votes

One of the nice things about linq was having infinite data sources processed lazily on request. I tried parallelizing my queries, and found that lazy loading was not working. For example...

class Program
{
    static void Main(string[] args)
    {
        var source = Generator();
        var next = source.AsParallel().Select(i => ExpensiveCall(i));
        foreach (var i in next)
        {
            System.Console.WriteLine(i);
        }
    }

    public static IEnumerable<int> Generator()
    {
        int i = 0;
        while (true)
        {
            yield return i;
            i++;
        }
    }

    public static int ExpensiveCall(int arg)
    {
        System.Threading.Thread.Sleep(5000);
        return arg*arg;
    }
}

This program fails to produce any results, presumably because at each step, its waiting for all calls to the generator to dry up, which of course is never. If I take out the "AsParallel" call, it works just fine. So how do I get my nice lazy loading while using PLINQ to improve performance of my applications?

2

2 Answers

5
votes

Take a look at MergeOptions

 var next = source.AsParallel()
              .WithMergeOptions(ParallelMergeOptions.NotBuffered)
              .Select(i => ExpensiveCall(i));
3
votes

I think you're confusing two different things. The problem here is not lazy loading (i.e. loading only as much as is necessary), the problem here is output buffering (i.e. not returning results immediately).

In your case, you will get your results eventually, although it might take a while (for me, it requires something like 500 results for it to return the first batch). The buffering is done for performance reasons, but in your case, that doesn't make sense. As Ian correctly pointed out, you should use .WithMergeOptions(ParallelMergeOptions.NotBuffered) to disable output buffering.

But, as far as I know, PLINQ doesn't do lazy loading and there is no way to change that. What that means is that if your consumer (in your case, the foreach loop) is too slow, PLINQ will generate results faster than necessary and it will stop only when you finish iterating the results. This means PLINQ can be wasting CPU time and memory.