5
votes

I'm trying use a secure way to create checksum for files (Larger than 10GB !).

SHA256 is secure enough for me but this algorithm is so process expensive and it is not suitable. Well I know that both SHA1 and MD5 checksums are insecure through the collisions.

So I just think the fastest and the safest way is combining MD5 with SHA1 like : SHA1+MD5 and I don't think there is way to get file (Collision) with the same MD5 and SHA1 both at a same time .

So is combining SHA1+MD5 secure enough for file checksum? or is there any attack like collision for it ?

I use c# mono in two way (Bufferstream and without Bufferedstream)

    public static string GetChecksum(string file)
    {
        using (FileStream stream = File.OpenRead(file))
        {
            var sha = new SHA256Managed();
            byte[] checksum = sha.ComputeHash(stream);
            return BitConverter.ToString(checksum).Replace("-", String.Empty);
        }
    }

    public static string GetChecksumBuffered(Stream stream)
    {
        using (var bufferedStream = new BufferedStream(stream, 1024 * 32))
        {
            var sha = new SHA256Managed();
            byte[] checksum = sha.ComputeHash(bufferedStream);
            return BitConverter.ToString(checksum).Replace("-", String.Empty);
        }
    }

Update 1: I mean SHA1 hash + MD5 hash. First calculate SHA1 of file then calculate MD5 of file then add this two sting together.

Update 2 :

As @zaph mentioned I implement my code(C# MONO) again according what I read here but it doesn't make my code as fast as he said ! It makes my speed for a 4.6 GB file from (approximate) 12mins to about 8.~ mins but sha1+md5 takes me less than 100 secs for this file. So I still think it isn't right to use SHA256 instead.

2
@zaph well what do you use ! I test what you said with the default implementation of Mono(C#) in a Linux mint - For a 4 GB file it takes about 12 mins - For 108 MB file it takes 19 seconds .... So what's wrong !!! I use a core i 7 Intel cpuMohammad Sina Karvandi
You must have a slow implementation, my cpu is a 2010 2.8 GHz Quad-Core Intel Xeon. My 2011 laptop is faster. Most Intel processors have instruction that can be used to make crypto operations faster.zaph
Are you doing this in real time?Erik Philips
@ErikPhilips I was tested this with a mobile phone chronometer before. I read sth about my problem here : stackoverflow.com/questions/1177607/… ... But after 8 mins nothing specially happened , and I'm waiting now !Mohammad Sina Karvandi
@ᔕIᑎᗩKᗩᖇᐯᗩᑎᗪI If you don't need it realtime, then why the concern for speed?Erik Philips

2 Answers

2
votes

There should be only a small difference between SHA-256 and a combination of MD5+SHA1.

The only way to know is to benchmark:

On my desk top:
SHA-256: 200 MB/s
MD5: 470 MB/s
SHA1: 500 MB/s (updated, previously incorrect)
MD5+SHA1 240 MB/s

These times are only for the hashing, disk read time is not included. The tests were done with a 1MB buffer and averaged over 10 runs. The language was "C" and the library used was Apple's Common Crypto. The cpu was a 2.8 GHz Quad-Core Intel Xeon (2010 MacPro, my laptop is faster).

In the end it is 23% faster to use the combined MD5+SHA1.

Note: Most Intel processors have instruction that can be used to make crypto operations faster. Not all implementations utilize these instructions.

YOumight try a native implementations such as sha256sum.

1
votes

If by SHA1+MD5 you mean hashing with SHA-1 first and then using that digest at input into MD5, then you are not eliminating collisions completely, just potentially reducing the chance of one occurring.

Both SHA-1 and MD5 are fixed length cryptographic hash functions, and according to the Pigeonhole Principle collisions are bound to occur if the message length is greater than the digest size. There are two instances of this in your use case:

  • When you hash your arbitrary-length message with SHA-1
  • When the 160-bit SHA-1 digest is used as input to MD5

My point is that collisions will always exist. However, the probability of finding one is exceedingly small. If the sole purpose is for file integrity, SHA-1 will do the job just fine on its own.

Related:

What checksum algorithm should I use?

Is MD5 still good enough to uniquely identify files?