0
votes

I wanted to do MD5 check when I download file from GCS. However, it seems that I didn't get the correct MD5 my on side..... One example s that got :

local 1B2M2Y8AsgTpgAmY7PhCfg==, cloud JWSLJAR+M3krp1RiOAJzOw==

But I'm pretty sure the file isn't corrupted...

The following code are with C#7.0, using System.Security.Cryptography;

           using (var memStream = new MemoryStream())
            {
                _StorageClient.DownloadObject(bucketName, gcsObj.Name, memStream);
                try
                {
                    using (var md5 = MD5.Create())
                    {
                        var hash = md5.ComputeHash(memStream);
                        localMd5 = Convert.ToBase64String(hash);
                    }
                    Console.WriteLine($"local {localMd5}, cloud {gcsObj.Md5Hash}");
                }
                catch
                {
                    Console.WriteLine("Error getting md5 checksum");
                }
            }

Another question is: the c# lib that I tried to get the CRC32C value of a file only return an uint type, but the GCS object's Crc32C value is a string. How to compare them?

4
CRC32C returns a 32-bit number., if GCS is using a string they have encoded the uint32 to a string, probably as hex.zaph
The CRC32C is base64 encoded, same as MD5 is.Nuno Cruces

4 Answers

1
votes

From your sample, I'm assuming your sample hash comes from the x-goog-hash header?

If that is the case, can you check what is the value x-goog-stored-content-encoding for the same file? If it is gzip, you uploaded a compressed copy to GCS and it is stored in gzip format. In that case, x-goog-hash is the MD5 of the gzipped copy stored on GCS.

To verify it you'd have to download the compressed version (not sure if that's possible with the C# library you're using), and check the MD5 hash of that.


For the CRC32C, you can use this:

Convert.ToBase64String(BitConverter.GetBytes(crc32c))

But the same thing applies: if it is gziped, this is the CRC32C of the gzipped version.


To check object metadata you can use:

gsutil stat gs://some-bucket/some-object

Sample output:

Creation time:          Sat, 20 Jan 2018 11:09:11 GMT
Update time:            Sat, 20 Jan 2018 11:09:11 GMT
Storage class:          MULTI_REGIONAL
Content-Encoding:       gzip
Content-Length:         5804
Content-Type:           application/msword
Hash (crc32c):          kxvpkw==
Hash (md5):             bfH75gryTXKgNosp1Smxvw==
ETag:                   CO7sotCz5tgCEAE=
Generation:             1516446551684718
Metageneration:         1

This object is stored in gzip format. Neither MD5/CRC32C will match those of the decompressed copy.

0
votes

You can use gsutils to get MD5 or Crc32c hash.

  Process p = new Process();
  p.StartInfo.UseShellExecute = false;
  p.StartInfo.RedirectStandardOutput = true;
  p.StartInfo.FileName = @"C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin\gsutil.cmd";
  p.StartInfo.Arguments = $"hash -m \"{paths[i]}\"";
  p.Start();
  string output = p.StandardOutput.ReadToEnd();
  p.WaitForExit();

  var outputSplitted = output.Split(':');
  string hash = outputSplitted[outputSplitted.Length - 1].Replace("\t", "").Replace("\r", "").Replace("\n", "");
0
votes

You must to set Position to 0 before calc MD5 :

memStream.Position = 0;
var hash = md5.ComputeHash(memStream);
-2
votes

You should not use Convert.ToBase64String method.

Try this instead:

static string Md5HashToString(byte[] hash)
{
    // Create a new StringBuilder to collect the bytes
    // and create a string.
    StringBuilder sBuilder = new StringBuilder();

    // Loop through each byte of the hashed data 
    // and format each one as a hexadecimal string.
    for (int i = 0; i < hash.Length; i++)
    {
        sBuilder.Append(hash[i].ToString("x2"));
    }

    // Return the hexadecimal string.
    return sBuilder.ToString();
}