3
votes

Is it possible to update the checksum (MD5, SHA1) when we have Hash value when we append file.

  1. I have file A already uploaded to server and i already have MD5 file which contain MD5 hash value.
  2. I want to append a new Data block (byte[]) to the file A and i have to update the new hash value for Md5 file.

Is it possible to update the MD5 hash value for the new file without reading the whole file A to create file hash (because in case file A is too large and it take too much time).

3
why not store the new data as distinct files?Scary Wombat

3 Answers

3
votes

If, and only if, you can choose the new data block to consist of one 0x80 byte, a certain number of 0x00 bytes depending on the size of file A, and 4 bytes containing the bit length of file A, followed by any other data you like, YES.

This is called a Length Extension Attack and is a cryptographic weakness of all hashes using the Merkle-Damgard construction, which includes MD5 SHA1 and the SHA-2 family, but not the SHA-3 family. This is not really a programming question and is more suitable on crypto.SX where there are already quite a few questions about it, such as https://crypto.stackexchange.com/questions/17733/sha1-multipart-calculation and https://crypto.stackexchange.com/questions/3978/understanding-the-length-extension-attack

However, if you save the hash's normally internal state as of the last full block before the end of data, and restore it and resume 'updating' from there adding the (unrestricted) new data, as I believe the other answers more or less intended, you can compute the new hash (and the new saved state if you want to repeat this process). If and how to access this state, and exactly how it needs to be represented, depends on the implementation you use. You tagged Java although your actual Q doesn't mention it; doing this using the crypto Java provides (JCA) would be very difficult because JCA intentionally hides the details of all supported algorithms behind a series of simplified, abstracted facade classes. OTOH if you (re)code these hashes yourself, accessing the internal state could be quite easy. And if you use the BouncyCastle 'lightweight' implementation(s), probably not very hard, though maybe at risk of them changing the implementation, but I'd have to look in detail. Storing and retrieving it may or may not be an issue.

2
votes

As far as I can see from the Wikipedia articles about MD5 or SHA1 this should be possible. You have to split the old hash back into the internal state variables (should be just some bitshifting) and then just continue the calcutation of the new hash. Disclaimer: I didn't try it myself, just read the wikipages about the algorithms.

Anyway: MD5 and SHA1 are broken. Please use the newer sha2 or sha3 hashes.

0
votes

I think you have to read the whole file again.

MD5 works (IIRC) by maintaining a bunch of internal 'registers' which change as the algorithm consumes each byte. So the only way to continue from a previous MD5 calculation would be if you had somehow stored the state of those 'registers' at the previous end-point.

Have a look at the internals of an MD5 calculation - I think there are some in Javascript which illustrate the general principle if you can't find a Java one. Even well written it's kind of ugly (which I guess is the point).