I'm trying to figure out the best performing approach when writing thousands of small Blobs to Azure Storage. The application scenario is the following:
- thousands of files are being created or overwritten by a constantly running windows service installed on a Windows Azure VM
- Writing to the Temporary Storage available to the VM, the service can reach more than 9,000 file creations per second
- file sizes range between 1 KB and 60 KB
- on other VMs with same sw running, other files are being created with same rate and criteria
- given the need to build and keep updated a central repository, another service running on each VM copies the newly created files from the Temporary Storage to Azure Blobs
- other servers should then read the Azure Blobs in their more recent version
Please note that for many constraints that I'm not listing for shortness, it's not currently possible to modify the main service to directly create Blobs instead of files on Temporary file system. ...and from what I' currently seeing it would mean a slower rate of creation, not acceptable per original requirements.
This copy operation, that I'm testing in a tight loop on 10,000 files, seems to be limited at 200 blob creations per second. I've been able to reach this result after tweaking the sample code named "Windows Azure ImportExportBlob" found here: http://code.msdn.microsoft.com/windowsazure/Windows-Azure-ImportExportB-9d30ddd5 with the async suggestions found in this answer: Using Parallel.Foreach in a small azure instance
I obtained this apparent maximum of 200 blob creations per second on an extralarge VM with 8 cores and setting the "maxConcurrentThingsToProcess" Semaphore accordingly. The network utilization during the test is max 1% of the available 10Gb shown in task manager. This means roughly 100 Mb of the 800 Mb that should be available on that VM size.
I see that the total size copied during the elapsed time gives me around 10 MB/sec.
Is there some limitation on the Azure Storage traffic you can generate or should I use a different approach when writing so many and small files ?