0
votes

I have a source text and its supposedly-zlib deflated embedding (and \ escaping) within another text file. I do not have docs on its encoding other than it uses zlib with nominal escaping for \0, \t, \n, \r, quote, etc.

The unescaped data has:

first four bytes: 1A 9B 02 00 last four bytes: 76 18 23 82

which inflate complains about having an invalid header.

When I deflate/inflate the matching source text myself using 1.2.5, I get:

first four bytes: 78 9C ED 7D

Can someone suggest what compression is being used given the header bytes? I haven't found any magic numbers or header formula that actually uses those.

EDIT: Here are the relevant files...

  • codedreadbase.cohdemo is the source text file with the escaped embedded section following the BASE verb. Escapes are:

    \n = (newline) \r = (return) \0 = 0 (NULL) \t = tab \q = " \s = ' \d = $ \p = %

  • codedreadbase.deflated is what I am passing to zlib inflateInit/inflate*/inflateEnd after unescpaing the above within the double quotes.

  • codedreadbase.txt is the original text of the embedded section.
1
Please supply the full version of the text with the alleged embedded deflate stream.Mark Adler
Added relevant files to post.redgiant
Stripping 4 front bytes before inflating in case of a prefix didn't help, despite bytes 5-8 looking very close (78 5E ED 7D) to the valid first 4 bytes when I independently deflated the source text myself (78 9C ED 7D).redgiant

1 Answers

1
votes

Your first four bytes, 1A 9B 02 00 are the length of the uncompressed data in little-endian order, 170778 in decimal. You have indeed found the start of a valid zlib stream with the next four bytes: 78 5E ED 7D. You just need to properly extract the binary compressed stream from the escaped format. I had no problem and decompressed codedreadbase.txt exactly.

You didn't mention one obvious escape, which is the backslash itself. \\ should go to \. Maybe that's what you're missing. This simple un-escaper in C worked:

#include <stdio.h>

int main(void)
{
    int ch;

    while ((ch = getchar()) != EOF) {
        if (ch == '\\') {
            ch = getchar();
            if (ch == EOF)
                break;
            ch =
                ch == 'n' ? '\n' :
                ch == 'r' ? '\r' :
                ch == '0' ? 0 :
                ch == 't' ? '\t' :
                ch == 'q' ? '"' :
                ch == 's' ? '\'' :
                ch == 'd' ? '$' :
                ch == 'p' ? '%' : ch;
        }
        putchar(ch);
    }
    return 0;
}