0
votes

I'm trying to use zlib in an iPhone app to compress a text file into a gzip file as a test. I am using the following code

const char *s = [[Path stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@".%@", [Path pathExtension]] withString:@".gz"] UTF8String];
gzFile *fi = (gzFile *)gzopen(s, "wb");
const char *c = readFile(Path.UTF8String);
gzwrite(fi, c, strlen(c));
gzclose(fi);

where readFile() returns a const char* that was obtained from the file using the fgets() function. The problem is, when I use this to compress a file, it doesn't compress it, but instead the gzip file is larger than original file. For example, I have a text file that is 90 bytes, and after using this method the size of the gzip is 98 bytes. Why isn't the gzip smaller than the original file?

3
Any kind of zip compression will add a header to identify the format and provide a file name and other overall structure. For small files it's entirely possible that this overhead will be larger than the compression savings.Mark Ransom
Compress a zero sized file to find the overhead.Martin York
@pst I considered adding that point to my comment but since it didn't apply to this specific case I figured it was just noise. Text is almost always compressible.Mark Ransom

3 Answers

7
votes

The GZip format includes fixed-size header information. Because you are compressing so little data, the header information is larger than the space you are saving.

90 bytes is generally not worth compressing.

http://www.gzip.org/zlib/rfc-gzip.html#header-trailer

2
votes

Regardless of the compression algorithm used there's always a chance that the generated data will be larger than the input otherwise it wouldn't be possible to encode any combination of input bit patterns.

As already stated in your special case a very small file size compared to header overhead seems to be the problem.

Nevertheless it might be good to keep in mind that there's never a guarantee the "compressed" file size will be smaller.

1
votes
  1. The data you are trying to compress is too small and there is not a lot of redundancy, so there is nothing left to compress. Compression algorithms work, to put it very simply, by eliminating repeating sequences in data. In 90 bytes, you probably don't have much redundancy, unless it's text like "aaaaaaa....".
  2. Fixed header overhead, as already mentioned.

Try a bigger data file.