0
votes

I'm trying to understand how php gzdeflate/gzinflate works. if I understand it's a simple bitstream with a three bit header. So, it would be possible to get only the first bits and see what is compressed. So by taking first bits, we could theoretically (?) extract bytes after bytes?

In fact I have lost some bits of a file (around the first 40-50 bytes with some missing bits, not all bits, just some of them). I just want to know if I can do an intelligent "bruteforce" in order to recreate the first bytes which can fully gzdeflate() the file. I know it's php code, so extracted bytes should be ASCII only. I tried to bruteforce all bits, but it's too long. So, if I could bruteforce bits after bits, it would be preferable.

(If there is a reimplementation in python which is readable, that would help me a lot). I've read http://www.zlib.net/feldspar.html which is more on how to compress data, I want to uncompress it.

Thank you

EDIT: Let's take an example. Here is my data (in hex):

39e0 6fb2 41eb ....

This data is substracted from a key. This key is used in its hexadecimal format, so I have only 16 possibilities for each char.

Algo is this one:

(ciphered - key) % 256 = deflate_data

Key is in hex form. So first byte key can only be: 0x30 -> 0x39 and 0x61 -> 0x66 I have only choices (in little endian for clarity, first bit is last block, two next bits are encoding type):

Key -> deflate
0 -> 10010000  --> No, if .00 code, all other bits must be 00
1 -> 00010000  --> No, if .00 code, all other bits must be 00
2 -> 11100000  --> No, .11 is reserved
3 -> 01100000  --> No, .11 is reserved
4 -> 10100000  --> Maybe?
5 -> 00100000  --> Maybe?
6 -> 11000000  --> Maybe?
7 -> 01000000  --> Maybe?
8 -> 10000000  --> Uncompressed? Must check LEN and NLEN
9 -> 00000000  --> Uncompressed? Must check LEN and NLEN
a -> 00011011  --> No, .00 should hav all others bits to 0
b -> 11101011  --> No, .11 is reserved
c -> 01101011  --> No, .11 is reserved
d -> 10101011  --> Maybe?
e -> 00101011  --> Maybe?
f -> 11001011  --> Maybe?

so the first byte of my key could be: 4,5,6,7 or d,e,f . Some of them used fixed dictionary. So is it theorically possible to try next bytes? Other bytes are dynamic dictionary. So is it possible to create a huffman tree with next bytes? Wrong key will likely produce impossible huffman trees. When I have few possibilities left, I could try to bruteforce remaining keys.

8 and 9 can be tested easily.

The key is constructed like this:

MD5(pass[::-1])+MD5(pass[:len(pass)])

So, theorically, key can be 32 to 50-60 chars, depending on the length of the key.

1

1 Answers

0
votes

So, it would be possible to get only the first bits and see what is compressed

No. 3-bit header just defines if it's a last encoded block or not and block's encoding type.

I just want to know if I can do an intelligent "bruteforce" in order to recreate the first bytes which can fully gzdeflate() the file

Unlikely. Usually most blocks will fall under encoding type 10 - compressed with dynamic Huffman codes. So according to RFC1951 - 3.2.3. Details of block format you will be stuck at reading first block Huffman code trees. If you can't decode this tree you will not even know where your first block ends, because you may have lost your end-of-block marker.

What you can try

If your file integrity is not so important, i.e. if it's text,XML,csv or something where you will benefit from partial file recovery - then you may succeed by this algo:

  1. Choose next brute-forced block length
  2. Remove block_length bytes from the start of input stream
  3. pass resulting stream into gzinflate()
  4. if success - you've got uncompressed data (except for first block), otherwise - repeat all steps

However, because in some blocks there may be back-references to previous blocks (including to your damaged and because of that removed first block) - you may be out of luck too