1
votes

I have the trial account in the azure blob storage. I try to upload 100000 generated files from my local machine. The operation already have duration over 17 hours and uploaded only ~77000 files. All files created by a simple bash-script:

for i in {1..100000}
do
    echo $i
    echo $i > $1\\$i.txt
done

Code for the uploading:

using(var stream = File.OpenWrite(textBoxManyUploadFileName.Text))
using(var writer = new StreamWriter(stream)) {
    foreach(var file in Directory.GetFiles(textBoxManyUploadFrom.Text)) {
        Guid id = Guid.NewGuid();
        storage.StoreFile(file, id, ((FileType)comboBoxManyUploadTypes.SelectedItem).Number);
        writer.WriteLine("{0}={1}", id, file);
    }
}

public void StoreFile(Stream stream, Guid id, string container) {
    try {
        var blob = GetBlob(id, container);
        blob.UploadFromStream(stream);
    } catch(StorageException exception) {
        throw TranslateException(exception, id, container);
    }
}

public void StoreFile(string filename, Guid id, int type = 0) {
    using(var stream = File.OpenRead(filename)) {
        StoreFile(stream, id, type);
    }
}

CloudBlob GetBlob(Guid id, string containerName) {
    var container = azureBlobClient.GetContainerReference(containerName);
    if(container.CreateIfNotExist()) {
        container.SetPermissions(new BlobContainerPermissions {
            PublicAccess = BlobContainerPublicAccessType.Container
        });
    }
    return container.GetBlobReference(id.ToString());
}

The first 10000 files have bean uploaded by 20-30 minutes then the speed decreased. I think it may due to the fact that the file names are GUID and Azure tries to build the clustered index. How to speed up? What is the problem?

2
GetFiles returns strings, right? But StoreFile takes a stream... what am I missing? (I'm wondering where you dispose of the stream. Perhaps something is leaking.) You might want to just do for (int i = 0; i < 100000; i++) { container.GetBlobReference(Guid.NewGuid().ToString()).UploadText(i.ToString()); } to simplify what you're measuring.user94559
I do not think that memleaks or disposing affect to the uploading speed. Stream is created by calling File.OpenRead(filename)brainstream
And are those streams properly disposed?user94559
Based on your answer above, are you creating stream from the file variable in the following line of code: storage.StoreFile(file, id, ((FileType)comboBoxManyUploadTypes.SelectedItem).Number); This line is actually storage.StoreFile(File.OpenRead(file), id, ((FileType)comboBoxManyUploadTypes.SelectedItem).Number); and not storage.StoreFile(stream, id, ((FileType)comboBoxManyUploadTypes.SelectedItem).Number); in your code. Correct?Gaurav Mantri
@smarx I have updated the question.brainstream

2 Answers

2
votes

To upload many small files, you should use multiple threads. You can use BeginUploadFromStream or Parallel.ForEach for instance.

1
votes

One more thing I noticed in your code is that you're calling GetBlob() function in your StoreFile() function which in turn calls CreateIfNotExist() function on your blob container. Please note that this function also result in a call to Storage Service thus adding delay in your upload process (not to mention you're also charged for a storage transaction each time you call this function).

I would recommend that you call this function just once before starting your blob upload.