How is the gzip file size encoded?

Question

The gzip file format contains the (uncompressed/original) file size encoded in the last 4 bytes of the compressed file. The "gzip -l" command reports the compressed and uncompressed sizes, the compression ratio, the original filename.

Looking around stackoverflow, there are a couple of mentions of decoding the size encoded in the last 4 bytes.

What is the encoding of the size? Big-endian (most significant byte first), Little-endian (least significant byte first), and is the value signed or unsigned?

This code snippet seems to be working for me,

FILE* fh; //assume file handle opened
unsigned char szbuf[4];
struct stat statbuf;
fstat(fn,&statbuf);
unsigned long clen=statbuf.st_size;
fseek(fh,clen-4,SEEK_SET);
int count=fread(szbuf,1,4,fh);
unsigned long ulen = ((((((szbuf[4-1] << 8) | szbuf[3-1]) << 8) | szbuf[2-1]) << 8) | szbuf[1-1]);

Here are a couple of related posts, which seem to imply little-endian, and unsigned long (0..4GB-1).

Determine uncompressed size of GZIP file

GZIPOutputStream not updating Gzip size bytes

Determine size of file in gzip

Gzip.org has more information about Gzip

See this answer for why that length should in general not be relied upon. — Mark Adler
Agreed. For single files encoded once, of a certain size (under 2^32 bytes), the RFC gives you the way to pull the last 4-bytes to get the file size. Perhaps not completely general, but still very useful. — ChuckCottrill

Medinoc Medinoc · Accepted Answer · 2014-09-24T21:50:22

RFC says it's modulo 2^32 which means uint32_t, and experimentation using a .Net GZipStream gives it as little-endian.

RFC 1952

How is the gzip file size encoded?

1 Answers