I am interfacing with a server that requires that data sent to it is compressed with Deflate algorithm (Huffman encoding + LZ77) and also sends data that I need to Inflate.
I know that Python includes Zlib, and that the C libraries in Zlib support calls to Inflate and Deflate, but these apparently are not provided by the Python Zlib module. It does provide Compress and Decompress, but when I make a call such as the following:
result_data = zlib.decompress( base64_decoded_compressed_string )
I receive the following error:
Error -3 while decompressing data: incorrect header check
Gzip does no better; when making a call such as:
result_data = gzip.GzipFile( fileobj = StringIO.StringIO( base64_decoded_compressed_string ) ).read()
I receive the error:
IOError: Not a gzipped file
which makes sense as the data is a Deflated file not a true Gzipped file.
Now I know that there is a Deflate implementation available (Pyflate), but I do not know of an Inflate implementation.
It seems that there are a few options:
- Find an existing implementation (ideal) of Inflate and Deflate in Python
- Write my own Python extension to the zlib c library that includes Inflate and Deflate
- Call something else that can be executed from the command line (such as a Ruby script, since Inflate/Deflate calls in zlib are fully wrapped in Ruby)
- ?
I am seeking a solution, but lacking a solution I will be thankful for insights, constructive opinions, and ideas.
Additional information: The result of deflating (and encoding) a string should, for the purposes I need, give the same result as the following snippet of C# code, where the input parameter is an array of UTF bytes corresponding to the data to compress:
public static string DeflateAndEncodeBase64(byte[] data)
{
if (null == data || data.Length < 1) return null;
string compressedBase64 = "";
//write into a new memory stream wrapped by a deflate stream
using (MemoryStream ms = new MemoryStream())
{
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
//write byte buffer into memorystream
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
//rewind memory stream and write to base 64 string
byte[] compressedBytes = new byte[ms.Length];
ms.Seek(0, SeekOrigin.Begin);
ms.Read(compressedBytes, 0, (int)ms.Length);
compressedBase64 = Convert.ToBase64String(compressedBytes);
}
}
return compressedBase64;
}
Running this .NET code for the string "deflate and encode me" gives the result
7b0HYBxJliUmL23Ke39K9UrX4HShCIBgEyTYkEAQ7MGIzeaS7B1pRyMpqyqBymVWZV1mFkDM7Z28995777333nvvvfe6O51OJ/ff/z9cZmQBbPbOStrJniGAqsgfP358Hz8iZvl5mbV5mi1nab6cVrM8XeT/Dw==
When "deflate and encode me" is run through the Python Zlib.compress() and then base64 encoded, the result is "eJxLSU3LSSxJVUjMS1FIzUvOT0lVyE0FAFXHB6k=".
It is clear that zlib.compress() is not an implementation of the same algorithm as the standard Deflate algorithm.
More Information:
The first 2 bytes of the .NET deflate data ("7b0HY..."), after b64 decoding are 0xEDBD, which does not correspond to Gzip data (0x1f8b), BZip2 (0x425A) data, or Zlib (0x789C) data.
The first 2 bytes of the Python compressed data ("eJxLS..."), after b64 decoding are 0x789C. This is a Zlib header.
SOLVED
To handle the raw deflate and inflate, without header and checksum, the following things needed to happen:
On deflate/compress: strip the first two bytes (header) and the last four bytes (checksum).
On inflate/decompress: there is a second argument for window size. If this value is negative it suppresses headers. here are my methods currently, including the base64 encoding/decoding - and working properly:
import zlib
import base64
def decode_base64_and_inflate( b64string ):
decoded_data = base64.b64decode( b64string )
return zlib.decompress( decoded_data , -15)
def deflate_and_base64_encode( string_val ):
zlibbed_str = zlib.compress( string_val )
compressed_string = zlibbed_str[2:-4]
return base64.b64encode( compressed_string )