1
votes

I am having a text data in XML format and it's length is around 816814 bytes. It contains some image data as well as some text data. We are using ZLIB algorithm for compressing and after compressing, the compressed data length is 487239 bytes.

After compressing we are encoding data using BASE64Encoder. But after encoding the compressed data, size is increasing and length of encoded data is 666748 bytes.

Why, after encoding data size is increasing? Is there any other best encoding techniques?

Regards, Siddesh

2

2 Answers

2
votes

As noted, when you are encoding binary 8-bit bytes with 256 possible values into a smaller set of characters, in this case 64 values, you will necessarily increase the size. For a set of n allowed characters, the expansion factor for random binary input will be log(256)/log(n), at a minimum.

If you would like to reduce this impact, then use more characters. Chances are that whatever medium you are using, it can handle more than 64 characters transparently. Find out how many by simply sending all 256 possible bytes, and see which ones make it through. Test the candidate set thoroughly, and then ideally find documentation of the medium that backs up that set of n < 256.

Once you have the set, then you can use a simple hard-wired arithmetic code to convert from the set of 256 to the set of n and back.

2
votes

That is perfectly normal.

Base64 is required to be done, if your transmitting medium is not designed to transmit binary data but only textual data (eg XML)

So your zip file gets base64 encoded.

Plainly speaking, it requires the transcoder to change "non-ASCII" letters into a ASCII form but still remember the way to go back

As a rule of thumb, it's around a 33% size increase ( http://en.wikipedia.org/wiki/Base64#Examples )

This is the downside of base64. You are better of using a protocol supporting file-transfer... but for files encoded within XML, you are pretty much out of options.