15
votes

I'm downloading a blob from blob storage that is 1GB in size.

If I use MS Azure storage explorer it takes under 10 minutes (I have a 20 megabits down line).

However when I use code:

await blobRef.DownloadToFileAsync("D:\\temp\\data.mdf", FileMode.Create);

(I've also tried to use an in memory stream) it takes over an hour to download 250MB (At which point I killed it). I've done this test multiple times and it happens consistently.

I also monitored the network traffic.

  • Via Storage Exlorer the network traffic downward is around 20Megabits
  • Via code the network traffic downward is around 1Megabit

EDIT: I'm still using an old version of Azure Storage Explorer (1.4.1). But I can confirm new versions are also giving the same results.

3
What about "Avoid having async void. Return a Task like public async TaskDownloadAsync(string path, string[] names) so you can await that method"Uzay
I don't have async void. Its an async method in an async main.Murdock
probably different encryption or zip methodsralf.w.
I can confirm this behavior. Downloading via MS Azure Storage explorer my file downloads in 01:09.87 but via the DownloadToFileAsync method, it takes 03:04:01. Maybe they are splitting the download into chunks in the Azure Storage Explorer and then doing a parallel download?jared

3 Answers

12
votes

You should specify which version of MS Azure Storage explorer your're using.

If you're using some newer versions of 1.9.0 / 1.8.1 / 1.8.0 etc.(please find more details in this link), then Azure Storage Explorer is integrated with azcopy which is using simple commands designed for optimal performance. So you can have a good-performance for downloading / uploading etc.

When using code for downloading / uploading blobs, you can take use of this Microsoft Azure Storage Data Movement Library. This library is based on the core data movement framework that powers AzCopy, which also provides you high-performance uploading, downloading.

6
votes

I eventually tried 2 solutions proposed by @Ivan and @mjwills:

Both solutions much faster than the original DownloadToFileAsync. DownloadToFileParallelAsync is only available in later versions of the library and hence was not available in the one I had installed.

4
votes

I'd suggest using DownloadToFileParallelAsync.

As per the docs:

Initiates an asynchronous operation to download the contents of a blob to a file by making parallel requests.

and:

The parallelIOCount and rangeSizeInBytes should be adjusted depending on the CPU, memory, and bandwidth.

This API should only be used for larger downloads as a HEAD request is made prior to downloading the data.

For smaller blobs, please use DownloadToFileAsync().

To get the best performance, it is recommended to try several values, and measure throughput.

One place to start would be to set the parallelIOCount to the number of CPUs.

Then adjust the rangeSizeInBytes so that parallelIOCount times rangeSizeInBytes equals the amount of memory you want the process to consume.

The benefit of this method vs DownloadToFileAsync is that multiple 'slices' of the file are downloaded in parallel (at the same time). This can be beneficial for large files over fast internet connections (in most cases, I'd expect it to be 4-8 times faster).