I am uploading large files (1-10 GB) to azure storage and need to calculate SHA1 hash value of files when uploaded. Am I able to calculate the SHA1 on the server, without having to download the file?
2 Answers
Azure Blob Storage support the MD5 hash calculation for blob automatically when putting blob, please see the content below of Get Blob Properties
.
Content-MD5
If the Content-MD5 header has been set for the blob, this response header is returned so that the client can check for message content integrity. In version 2012-02-12 and newer, Put Blob sets a block blob’s MD5 value even when the Put Blob request doesn’t include an MD5 header.
So it's not necessary to calculate SHA1 hash for a blob if not has special needs.
As reference, here is a sample which calculate SHA1 hash without downloading for a blob stored in storage.
Synchronous
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("<StorageAccountConnectionString>");
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("<container-name>");
CloudBlob blob = container.GetBlobReference("<blob-name>");
using(Stream blobStream = blob.OpenRead())
{
using (SHA1 sha1 = SHA1.Create())
{
byte[] checksum = sha1.ComputeHash(blobStream);
}
}
Async:
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("<StorageAccountConnectionString>");
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("<container-name>");
CloudBlob blob = container.GetBlobReference("<blob-name>");
using(Stream blobStream = await blob.OpenReadAsync().ConfigureAwait(false))
{
using (SHA1 sha1 = SHA1.Create())
{
byte[] checksum = await sha1.ComputeHashAsync(blobStream);
}
}
// ComputeHashAsync extension method from https://www.tabsoverspaces.com/233439-computehashasync-for-sha1
public static async Task<Byte[]> ComputeHashAsync(this HashAlgorithm algo, Stream stream, Int32 bufferSize = 4096)
{
algo.Initialize();
var buffer = new byte[bufferSize];
var streamLength = inputStream.Length;
while (true)
{
var read = await inputStream.ReadAsync(buffer, 0, buffer.Length).ConfigureAwait(false);
if (inputStream.Position == streamLength)
{
algo.TransformFinalBlock(buffer, 0, read);
break;
}
algo.TransformBlock(buffer, 0, read, default(byte[]), default(int));
}
return algo.Hash;
}
Merging together a few posts I created the following fake stream to calculate the MD5 from a blob. The advantage of using a fake stream of course is that you don't hold the whole file in memory, there's no need to. It works well for me though it could probably do with a little polishing.
class MD5StreamCalculator: Stream {
MD5 md5Check;
public MD5StreamCalculator() {
md5Check = MD5.Create();
}
public string GetFinalMD5() {
md5Check.TransformFinalBlock(new byte[0], 0, 0);
byte[] hashBytes = md5Check.Hash;
return Convert.ToBase64String(hashBytes);
}
public override bool CanRead {
get {
return false;
}
}
public override bool CanSeek {
get {
return false;
}
}
public override bool CanWrite {
get {
return true;
}
}
public override long Length {
get {
throw new NotImplementedException();
}
}
public override long Position {
get {
throw new NotImplementedException();
}
set {
throw new NotImplementedException();
}
}
public override void Flush() {
}
public override int Read(byte[] buffer, int offset, int count) {
throw new NotImplementedException();
}
public override long Seek(long offset, SeekOrigin origin) {
throw new NotImplementedException();
}
public override void SetLength(long value) {
throw new NotImplementedException();
}
public override void Write(byte[] buffer, int offset, int count) {
md5Check.TransformBlock(buffer, 0, count, null, 0);
}
}
...
MD5StreamCalculator md5Stream = new MD5StreamCalculator();
targetBlockBlob.DownloadToStream(md5Stream);
Console.WriteLine("BASE64 = " + md5Stream.GetFinalMD5());