0
votes

I would like to store data in a compressed format between various applications (some in Python, some in Java, etc.) in such a way that:

  • the producer application can choose from among several formats (e.g. gzip/zstd/zlib/brotli)
  • the consumer application has all the information it needs to uncompress the data

Once the data is uncompressed, all applications know how to deal with the resulting information.

Is there a common/standard container format which includes the compression algorithm type? (e.g. prepending the compressed data with a MIME type in ASCII) Or do the compressed data from most methods already contain a header and magic number that allow the compression type to be determined?

1
stackoverflow.com/questions/39008957/… seems to indicate that brotli isn't autodetectable :/ - Jason S
zstd starts with hex 28 b5 2f fd (github.com/facebook/zstd/blob/dev/doc/…) - Jason S

1 Answers

2
votes

The zip format is quite common, and specifies the compression algorithm. It has method numbers for deflate (8), which is used by gzip and zlib, and zstd (93), but not brotli yet. Also has xz (95).

As for individual wrappers, that's what zlib and gzip are, and zstd has a detectable wrapper. raw brotli, however, is difficult to detect. I am not aware of consistent use of a brotli wrapper. See this lovely answer for why. There was a proposal for a brotli wrapper (also lovely), but I don't think it is in use.