2
votes

I have a WebRole running on a small instance. This WebRole has a method that uploads a large amount of files to BLOB storage. According to the Azure instances specs, a small instance has only 1 core. So when uploading those blobs, will Parallel.Foreach give me any benefits over a regular Foreach ?

3

3 Answers

5
votes

You would be much better served by focusing on using the aysnc versions of the blob storage APIs and/or Stream APIs so that you are I/O bound rather than CPU bound. Anywhere there is a BeginXXX API you should use it by wrapping it up with Task.Factory.FromAsync and the using a continuation from there. In your specific case you should leverage CloudBlob.BeginUploadFromStream. How you get the stream initially is just as important so look for async APIs on that end too.

The only thing that may hold you back from using a small instance after that is that it's capped at 100Mbps where as medium is 200Mbps. Then again you can always leverage the elasticity factor and increase role count when you need more processing and scale back again when things calm down.

Here's an example of how you would call BeginUploadFromStream using FromAsync. Now, as far as coordinating concurrent processing, since you're now kicking off async tasks you can't count on Parallel::ForEach to constrain the max concurrency for you. This means you will just have a regular foreach on the original thread with a Semaphore to limit concurrency. This will provide the equivalent of MaxDegreeOfParallelism:

// Setup a semaphore to constrain the max # of concurrent "thing"s we will process
int maxConcurrency = ... read from config ...
Semaphore maxConcurrentThingsToProcess = new Semaphore(maxConcurrency, maxConcurrency);

// Current thread will enumerate and dispatch I/O work async, this will be the only CPU resource we're holding during the async I/O
foreach(Thing thing in myThings)
{
    // Make sure we haven't reached max concurrency yet
    maxConcurrentThingsToProcess.WaitOne();

    try
    {
        Stream mySourceStream = ... get the source stream from somewhere ...;
        CloudBlob myCloudBlob = ... get the blob from somewhere ...;

        // Begin uploading the stream asynchronously
        Task uploadStreamTask = Task.Factory.FromAsync(
            myCloudBlob.BeginUploadFromStream,
            myCloudBlob.EndUploadFromStream,
            mySourceStream,
            null);

        // Setup a continuation that will fire when the upload completes (regardless of success or failure)
        uploadStreamTask.ContinueWith(uploadStreamAntecedent =>
        {
            try
            {
                // upload completed here, do any cleanup/post processing
            }
            finally
            {
                // Release the semaphore so the next thing can be processed
                maxConcurrentThingsToProcess.Release();
            }
        });
    }
    catch
    {
        // Something went wrong starting to process this "thing", release the semaphore
        maxConcurrentThingsToProcess.Release();

        throw;
    }
}

Now in this sample I am not showing how you should also be getting the source stream asynchronously, but if, for example, you were downloading that stream from a URL someplace else, you would want to kick that off asynchronously as well and chain the starting of the async upload here into a continuation on that.

Believe me, I know this is more code than just doing a simple Parallel::ForEach, but Parallel::ForEach exists to make concurrency for CPU bound tasks easy. When it comes to I/O, using the async APIs is the only way to achieve maximum I/O throughput while minimizing CPU resources.

3
votes

The number of cores doesn't directly correlate to the number of threads spawned by Parallel.ForEach().

About a year ago, David Aiken did a very informal test with some blob+table access, with and without Parallel.ForEach(), on a Small instance. You can see the results here. In this case, there was a measured improvement, as this was not a CPU-bound activity. I suspect you'll see an improvement in performance as well, since you're uploading a large number of objects to blob storage.

3
votes

Yes it will because each of your uploads will be network bound, so the scheduler can share your single core amongst them. (This after all, is how single-core, single-CPU computers get more than one thing done at a time.)

You could also use the asynchronous blob upload functions for a similar effect.