9
votes

I am working on an application where file-uploads happen often, and can be pretty big in size.

Those files are being uploaded to Web API, which will then get the stream from the request, and pass it on to my storage service, that then uploads it to Azure Blob Storage.

I need to make sure that:

  • No temp. files are written on the Web API instance
  • The request stream is not fully read into memory before passing it on to the storage service (to prevent Out of Memory exceptions).

I've looked at this article, that describes how to disable input stream buffering, but as many file uploads from many different users happen simultaneously, it's important that it actually does what it says on the tin.

This is what I have in my controller at the moment:

if (this.Request.Content.IsMimeMultipartContent())
{
    var provider = new MultipartMemoryStreamProvider();
    await this.Request.Content.ReadAsMultipartAsync(provider);
    var fileContent = provider.Contents.SingleOrDefault();

    if (fileContent == null)
    {
        throw new ArgumentException("No filename.");
    }

    var fileName = fileContent.Headers.ContentDisposition.FileName.Replace("\"", string.Empty);

    // I need to make sure this stream is ready to be processed by 
    // the Azure client lib, but not buffered fully, to prevent OoM.
    var stream = await fileContent.ReadAsStreamAsync();
}

I don't know how I can reliably test this.

EDIT: I forgot to mention that uploading directly to Blob Storage (circumventing my API) won't work, as I am doing some size checking (e.g. can this user upload 500mb? Has this user used his quota?).

2
Have you tried copying the input stream directly to the blob storage?Yuval Itzchakov
Thats what I am doing, but I need to make sure that I am not fully buffering the input stream before blob storage client starts uploading, and I don't know how to test that it's actually happening.Jeff
Have you tried profiling your app to see if it's buffering it before the read?Yuval Itzchakov
Get a memory profiler and test your app.Yuval Itzchakov
I've found that the file is indeed copied to memory before sending it off to Azure. This is a problem.Jeff

2 Answers

11
votes

Solved it, with the help of this Gist.

Here's how I am using it, along with a clever "hack" to get the actual file size, without copying the file into memory first. Oh, and it's twice as fast (obviously).

// Create an instance of our provider.
// See https://gist.github.com/JamesRandall/11088079#file-blobstoragemultipartstreamprovider-cs for implementation.
var provider = new BlobStorageMultipartStreamProvider ();

// This is where the uploading is happening, by writing to the Azure stream
// as the file stream from the request is being read, leaving almost no memory footprint.
await this.Request.Content.ReadAsMultipartAsync(provider);

// We want to know the exact size of the file, but this info is not available to us before
// we've uploaded everything - which has just happened.
// We get the stream from the content (and that stream is the same instance we wrote to).
var stream = await provider.Contents.First().ReadAsStreamAsync();

// Problem: If you try to use stream.Length, you'll get an exception, because BlobWriteStream
// does not support it.

// But this is where we get fancy.

// Position == size, because the file has just been written to it, leaving the
// position at the end of the file.
var sizeInBytes = stream.Position;

Voilá, you got your uploaded file's size, without having to copy the file into your web instance's memory.

As for getting the file length before the file is uploaded, that's not as easy, and I had to resort to some rather non-pleasant methods in order to get just an approximation.

In the BlobStorageMultipartStreamProvider:

var approxSize = parent.Headers.ContentLength.Value - parent.Headers.ToString().Length;

This gives me a pretty close file size, off by a few hundred bytes (depends on the HTTP header I guess). This is good enough for me, as my quota enforcement can accept a few bytes being shaved off.

Just for showing off, here's the memory footprint, reported by the insanely accurate and advanced Performance Tab in Task Manager.

Before - using MemoryStream, reading it into memory before uploading

Before

After - writing directly to Blob Storage

After

7
votes

I think a better approach is for you to go directly to Azure Blob Storage from your client. By leveraging the CORS support in Azure Storage you eliminate load on your Web API server resulting in better overall scale for your application.

Basically, you will create a Shared Access Signature (SAS) URL that your client can use to upload the file directly to Azure storage. For security reasons, it is recommended that you limit the time period for which the SAS is valid. Best practices guidance for generating the SAS URL is available here.

For your specific scenario check out this blog from the Azure Storage team where they discuss using CORS and SAS for this exact scenario. There is also a sample application so this should give you everything you need.