0
votes

I am using Ruby's ZLib library to decompress a smallish (10k) gzip file (in memory using a StringIO class) and its taking approximately 2.5 seconds to decompress. Compressing the data takes ~100ms, so I don't understand why the decompression is taking magnitudes longer than the compress function.

My function takes a StringIO object (with the contents of the compressed data) and returns an array of (3 - where '3' is defined by the int_size parameter) byte integers, like:

def decompress(io, int_size = 3)
  array = Array.new(262144)
  i = 0
  io.rewind
  gz = Zlib::GzipReader.new(io)
  until gz.eof?
    buffer = gz.read(int_size)
    array[i] = buffer.unpack('C*').inject { |r, n| r << 8 | n }
    i += 1
  end
  array
end

The same file decompresses at the OSX command line in a blink of an eye.

Is there a faster way to decompress the file, or perhaps a faster library or a way to use the gzip on the local system to get this happening much faster than it is now?

2
Use a system tool whenever you can, those tools are surprisingly efficient. They're super optimized and very reliable.yeyo
Yeah thats what I thought - but HOW do I do that?Ash
Yes, there's something very wrong. 10K takes about 150 micro seconds to decompress on my four-year-old 2 GHz i7.Mark Adler

2 Answers

0
votes

I'm not sure what's going on there (I reproduced the slowness only with a highly compressed gzip file), but decompressing all at once is faster, something like this:

def decompress(io, int_size = 3)
    array = Array.new(262144)
    i = 0
    io.rewind
    gz = Zlib::GzipReader.new(io)
    dec = gz.read
    seq = StringIO.new(dec, "rb")
    until seq.eof?
        buffer = seq.read(int_size)
        array[i] = buffer.unpack('C*').inject { |r, n| r << 8 | n }
        i += 1
    end
    array
end

Faster still would be to use map instead of a loop:

def decompress(io, int_size = 3)
    io.rewind
    gz = Zlib::GzipReader.new(io)
    dec = gz.read
    dec.unpack('C*').each_slice(int_size).to_a.map {|t| t.inject {|r,n| r << 8 | n}}
end
0
votes

You can also use ruby-zstds, it has similar api as gzip. But zstd compression and decompression is very fast. Please try.