3
votes

I'm adding in compression to my project with the aim of improving speed in the 3G Data communication from Android app to ASP.NET C# Server.

The methods I've researched/written/tested works. However, there's added white space after compression. And they are different as well. This really puzzles me.

Is it something to do with different implementation of the GZIP classes in both Java/ASP.NET C#? Is it something that I should be concerned with or do I just move on with .Trim() and .trim() after decompressing?


Java, compressing "Mary had a little lamb" gives:

Compressed data length: 42
Base64 Compressed String: H4sIAAAAAAAAAPNNLKpUyEhMUUhUyMksKclJVchJzE0CAHrIujIWAAAA

protected static byte[] GZIPCompress(byte[] data) {
    try {
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        GZIPOutputStream gZIPOutputStream = new GZIPOutputStream(byteArrayOutputStream);

        gZIPOutputStream.write(data);
        gZIPOutputStream.close();

        return byteArrayOutputStream.toByteArray();
    } catch(IOException e) {
        Log.i("output", "GZIPCompress Error: " + e.getMessage());
        return null;
    }
}


ASP.NET C#, compressing "Mary had a little lamb"

Compressed data length: 137
Base64 Compressed String: H4sIAAAAAAAEAO29B2AcSZYlJi9tynt/SvVK1+B0oQiAYBMk2JBAEOzBiM3mkuwdaUcjKasqgcplVmVdZhZAzO2dvPfee++999577733ujudTif33/8/XGZkAWz2zkrayZ4hgKrIHz9+fB8/Ir7I6ut0ns3SLC2Lti3ztMwWk/8Hesi6MhYAAAA=

    public static byte[] GZIPCompress(byte[] data)
    {
        using (MemoryStream memoryStream = new MemoryStream())
        {
            using (GZipStream gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
            {
                gZipStream.Write(data, 0, data.Length);
            }

            return memoryStream.ToArray();
        }
    }
1
Your code shows you compressing bytes, but you've given the source as a string - how are you getting the bytes from the string? (When I use UTF-8, I get 42 bytes in .NET.) - Jon Skeet
What version of .NET are you using? - Jon Skeet
Just to answer here as well in case it's misleading. I didn't want to add too much code into the page to detract from the question. I'm using String.getBytes("UTF-8") and Encoding.UTF8.GetBytes() - Uknight
Even in .NET 4.5 that class has bugs. Use DotNetZip instead. - Mark Adler
@MarkAdler Thanks for the suggestion, I'll take a look at it at a later time. I would prefer to stick to native libraries though, as long as it still gets the job done reasonably well. - Uknight

1 Answers

5
votes

I get 42 bytes on .NET as well. I suspect you're using an old version of .NET which had a flaw in its compression scheme.

Here's my test app using your code:

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

class Program
{
    static void Main(string[] args)
    {
        var uncompressed = Encoding.UTF8.GetBytes("Mary had a little lamb");
        var compressed = GZIPCompress(uncompressed);
        Console.WriteLine(compressed.Length);
        Console.WriteLine(Convert.ToBase64String(compressed));
    }

    static byte[] GZIPCompress(byte[] data)
    {
        using (var memoryStream = new MemoryStream())
        {
            using (var gZipStream = new GZipStream(memoryStream, 
                                                   CompressionMode.Compress))
            {
                gZipStream.Write(data, 0, data.Length);
            }

            return memoryStream.ToArray();
        }
    }
}

Results:

42
H4sIAAAAAAAEAPNNLKpUyEhMUUhUyMksKclJVchJzE0CAHrIujIWAAAA

This is exactly the same as the Java data.

I'm using .NET 4.5. I suggest you try running the above code on your machine, and compare the results.

I've just decompressed the base64 data you provided, and it is a valid "compressed" form of "Mary had a little lamb", with 22 bytes in the uncompressed data. That surprises me... and reinforces my theory that it's a framework version difference.

EDIT: Okay, this is definitely a framework version difference. If I compile with the .NET 3.5 compiler, then use an app.config which forces it to run with that version of the framework, I see 137 bytes as well. Given comments, it looks like this was only fixed in .NET 4.5.