In the chipz decompression library there is an extremely useful function make-decompressing-stream
, which provides an interface (using Gray streams behind the scenes) to transparently decompress data read from the provided stream. This allows me to write a single function read-tag
(which reads a single "tag" from a stream of structured binary data, much like Common Lisp's read
function reads a single Lisp "form" from a stream) that works on both compressed and uncompressed data, eg:
;; For uncompressed data:
(read-tag in-stream)
;; For compressed data:
(read-tag (chipz:make-decompressing-stream 'chipz:zlib in-stream))
As far as I can tell, the API of the associated compression library, salza2, doesn't provide an (out-of-the-box) equivalent interface for performing the reverse task. How could I implement such an interface myself? Let's call it make-compressing-stream
. It will be used with my own complementary write-tag
function, and provide the same benefits as for reading:
;; For uncompressed-data:
(write-tag out-stream current-tag)
;; For compressed data:
(write-tag (make-compressing-stream 'salza2:zlib-compressor out-stream)
current-tag)
In salza2's documentation (linked above), in the overview, it says: "Salza2 provides an interface for creating a compressor object. This object acts as a sink for octets (either individual or vectors of octets), and is a source for octets in a compressed data format. The compressed octet data is provided to a user-defined callback that can write it to a stream, copy it to another vector, etc." For my current purposes, I only require compression in zlib and gzip formats, for which standard compressors are provided.
So here's how I think it could be done: Firstly, convert my "tag" object to an octet vector, secondly compress it using salza2:compress-octet-vector
, and thirdly, provide a callback function that writes the compressed data directly to a file. From reading around, I think the first step could be achieved using flexi-streams:with-output-to-sequence
- see here - but I'm really not sure about the callback function, despite looking at salza2's source. But here's the thing: a single tag can contain an arbitrary number of arbitrarily nested tags, and the "leaf" tags of this structure can each carry a sizeable payload; in other words, a single tag can be quite a lot of data.
So the tag->uncompressed-octets->compressed-octets->file conversion would ideally need to be performed in chunks, and this raises a question that I don't know how to answer, namely: compression formats - AIUI - tend to store in their headers a checksum of their payload data; if I compress the data one chunk at a time and append each compressed chunk to an output file, surely there will be a header and checksum for each chunk, as opposed to a single header and checksum for the entire tag's data, which is what I want? How can I solve this problem? Or is it already handled by salza2?
Thanks for any help, sorry for rambling :)