I have a problem where as input I have a large byte array (typically >1K length) which has a computed CRC32. I need to replace a small chunk of the array with different values, and recalculate the CRC. Is there an efficient way to do this without looping through the entire original byte array? I suspect it's mathematically possible to take as inputs the original CRC, the bytes to be replaced, the new bytes, and calculate the new CRC with an algorithm whose loop size is just the number of bytes to be replaced, but it's outside my area of expertise, hence, just a suspicion. Thanks,
2 Answers
Yes, this can be done. Though at the 1K-byte level it would very likely be faster to just recompute the CRC on the whole thing.
As noted by Dmitry Rubanovich, you can use the fact that a CRC is a linear function where addition is replaced by exclusive-or. You can do better however than just avoiding the recalculation up to the first change. Wherever you have a long string of no changes, which is a long string of zeros in the exclusive-or of the two messages, you can compute the change in the CRC over that string in O(log(n)) time instead of O(n) time.
The way this is done is to generate a series of 32x32 bit matrices, each of which represents the application of a power-of-2 zero bytes to the CRC. E.g. 1, 2, 4, 8, etc. zero bytes. This can be done ahead of time, producing a static table. Then in order to, for example, evolve a CRC by 137 zeros, you multiply the current CRC as a bit vector by the matrices for 128, 8, and 1 zeros. The matrix multiplication is over GF(2), that is, addition is replaced by exclusive-or and multiplication is replaced by the and operation.
You can see an example of how this is done in the zlib function for combining CRC's.
A quick calculation using instruction counts indicates that the logarithmic approach will be faster on average for strings of zeros longer than about 300 bytes.
CRC is linear. So CRC(A xor B) = CRC(A) xor CRC(B). If you know exactly which bits you are changing, you can decompose the changing of the bits into an XOR operation. Then you only have to compute CRC starting with the highest significant bit of B. If it's close to the end of the stream, there will be a lot of leading zeros. You don't need to run the CRC over them because CRC of 0 is 0 (crc is a remainder of dividing a polynomial by another polynomial... if the leading polynomial coefficients are 0, you can ignore them).