2
votes

I am uploading large files (1-10 GB) to azure storage and need to calculate SHA1 hash value of files when uploaded. Am I able to calculate the SHA1 on the server, without having to download the file?

2

2 Answers

4
votes

Azure Blob Storage support the MD5 hash calculation for blob automatically when putting blob, please see the content below of Get Blob Properties.

Content-MD5

If the Content-MD5 header has been set for the blob, this response header is returned so that the client can check for message content integrity. In version 2012-02-12 and newer, Put Blob sets a block blob’s MD5 value even when the Put Blob request doesn’t include an MD5 header.

So it's not necessary to calculate SHA1 hash for a blob if not has special needs.

As reference, here is a sample which calculate SHA1 hash without downloading for a blob stored in storage.

Synchronous

CloudStorageAccount storageAccount = CloudStorageAccount.Parse("<StorageAccountConnectionString>");
CloudBlobClient     blobClient     = storageAccount.CreateCloudBlobClient();
CloudBlobContainer  container      = blobClient.GetContainerReference("<container-name>");
CloudBlob           blob           = container.GetBlobReference("<blob-name>");

using(Stream blobStream = blob.OpenRead())
{
    using (SHA1 sha1 = SHA1.Create())
    {
        byte[] checksum = sha1.ComputeHash(blobStream);
    }
}

Async:

CloudStorageAccount storageAccount = CloudStorageAccount.Parse("<StorageAccountConnectionString>");
CloudBlobClient     blobClient     = storageAccount.CreateCloudBlobClient();
CloudBlobContainer  container      = blobClient.GetContainerReference("<container-name>");
CloudBlob           blob           = container.GetBlobReference("<blob-name>");

using(Stream blobStream = await blob.OpenReadAsync().ConfigureAwait(false))
{
    using (SHA1 sha1 = SHA1.Create())
    {
        byte[] checksum = await sha1.ComputeHashAsync(blobStream);
    }
}

// ComputeHashAsync extension method from https://www.tabsoverspaces.com/233439-computehashasync-for-sha1
public static async Task<Byte[]> ComputeHashAsync(this HashAlgorithm algo, Stream stream, Int32 bufferSize = 4096)
{
    algo.Initialize();

    var buffer = new byte[bufferSize];
    var streamLength = inputStream.Length;
    while (true)
    {
        var read = await inputStream.ReadAsync(buffer, 0, buffer.Length).ConfigureAwait(false);
        if (inputStream.Position == streamLength)
        {
            algo.TransformFinalBlock(buffer, 0, read);
            break;
        }
        algo.TransformBlock(buffer, 0, read, default(byte[]), default(int));
    }

    return algo.Hash;
} 
2
votes

Merging together a few posts I created the following fake stream to calculate the MD5 from a blob. The advantage of using a fake stream of course is that you don't hold the whole file in memory, there's no need to. It works well for me though it could probably do with a little polishing.

class MD5StreamCalculator: Stream {
    MD5 md5Check;

    public MD5StreamCalculator() {
        md5Check = MD5.Create();
    }

    public string GetFinalMD5() {
        md5Check.TransformFinalBlock(new byte[0], 0, 0);
        byte[] hashBytes = md5Check.Hash;
        return Convert.ToBase64String(hashBytes);
    }

    public override bool CanRead {
        get {
            return false;
        }
    }

    public override bool CanSeek {
        get {
            return false;
        }
    }

    public override bool CanWrite {
        get {
            return true;
        }
    }

    public override long Length {
        get {
            throw new NotImplementedException();
        }
    }

    public override long Position {
        get {
            throw new NotImplementedException();
        }

        set {
            throw new NotImplementedException();
        }
    }

    public override void Flush() {
    }

    public override int Read(byte[] buffer, int offset, int count) {
        throw new NotImplementedException();
    }

    public override long Seek(long offset, SeekOrigin origin) {
        throw new NotImplementedException();
    }

    public override void SetLength(long value) {
        throw new NotImplementedException();
    }

    public override void Write(byte[] buffer, int offset, int count) {
        md5Check.TransformBlock(buffer, 0, count, null, 0);
    }
}

...

MD5StreamCalculator md5Stream = new MD5StreamCalculator();
targetBlockBlob.DownloadToStream(md5Stream);
Console.WriteLine("BASE64 = " + md5Stream.GetFinalMD5());