0
votes

Reference the discussion in this link:

What is the algorithm to compute the Amazon-S3 Etag for a file larger than 5GB?

The steps to recreate the MD5 hash is to 1) concatenate the md5 hashes for each upload part, 2) convert the concatenated hash into binary, 3) get the md5 hash of the binary, then 4) add the hyphen and number of parts to the hash. That all sounds easy enough, but where I'm struggling is in step 3. To get the hash of the binary I need to convert the string into a byte array. To get the byte array I need to know what encoding format to use. That's the part I'm missing. Do I use ASCII, UTF8, Unicode, BigEndian, something else?

I've tried using the four formats above and none have produced the correct hash. I just can't seem to figure this one out. The code I'm using is:

CompleteMultipartUploadResponse compResp = new CompleteMultipartUploadResponse();
CompleteMultipartUploadRequest compReq = new CompleteMultipartUploadRequest();
string requestETagHash = "";

compResp = client.CompleteMultipartUpload(compReq);
string compETag = compResp.ETag;                                            
foreach (PartETag s in compReq.PartETags)
{
    requestETagHash += s.ETag.Replace('\"', ' ').Trim().Split('-').First();
}

StringBuilder sb = new StringBuilder();
foreach (char c in requestETagHash)
{
    try
    {
         sb.AppendFormat(Convert.ToString(Convert.ToInt16(c.ToString(), 16), 2).PadLeft(4, '0'));
    }
    catch (Exception ex)
    {
        MessageBox.Show("Hash error:\n\n" + ex.Message);
    }
}
//What encoding is used in this line?
byte[] b = System.Text.Encoding.UTF8.GetBytes(sb.ToString());

byte[] data = md5Hash.ComputeHash(b, 0, b.Length);

StringBuilder sBuilder = new StringBuilder();
for (int i = 0; i < data.Length; i++)
{
    sBuilder.Append(data[i].ToString("x2"));
}

Any in solving this would be appreciated.

1
How are you uploading the actual data? It's not clear where text comes in here at all.Jon Skeet
Note from the question you linked to: "Since MD5 checksums are hex representations of binary data, just make sure you take the MD5 of the decoded binary concatenation"Jon Skeet
Basically it sounds like you're doing this too late - you should be computing each MD5 hash as a byte[], then a) concatentating those byte[] hashes together (so you can hash the result again); b) converting each hash into hex for the etag.Jon Skeet
Thanks, Jon. I was going to comment the code to make it more clear what is happening where, but can't seem to figure out how to do that. Regarding the note you quoted, that is what is tripping me up, and why I was converting the hash (which is hex) to binary. There's a piece in there that I'm not getting.user1750310
The hash doesn't start out as hex. You haven't shown the code that computes the hash of your data to start with. (You seem to be making the request right near the start, which is very odd to begin with... normally you'd do this before making the request, wouldn't you?)Jon Skeet

1 Answers

0
votes

Problem solved. Thank you, Jon! Your comment about my getting the hash late got me thinking about where to find the hash's byte array vs. the hex value I was using. I modified my code to get and concatenate the hash byte array immediately after uploading each file part. Then, after receiving the CompleteMultiPartUploadResponse response, I hash that concatenated array, and voila, I get the same hash as the eTag returned from S3 for the completed upload.