I am storing many chunks of base64 encoded 64-bit doubles in an XML file. The double data all looks similar.
The double data is currently being compressed using the java 'Deflate' algorithm before the encoding, however each chunk of binary data in the file will have its own deflate data dictionary, which is an overhead I would like to greatly lessen. The 'Deflater' class has a 'setDictionary' method which I would like to use.
So questions are:
1). Does anyone have any suggestions for how to best build my own single custom data dictionary based on multiple sections of doubles (x8 bytes) that could he used for multiple deflate operations, i.e. use the same dictionary for all the compressions? Should I be looking for common bytes across all byte arrays, with the commonest byte put at the end of the dictionary array?
2). Can I separate the (custom) data dictionary from the deflated data, and then set the dictionary against the deflated data later before inflating the data again?
3). Will the deflate algorithm take my custom data dictionary, and then just create its own slightly different data dictionary anyway, both invalidating my singular data dictionary and lessening the potential space saving from using a singular data dictionary?
4). Can someone elaborate on the structure of zlib compressed data, so that I myself may try to separate the data dictionary from the compressed data?
I want to only use space for the data dictionary once in my file, and use it for each block of my double data in my filebut not store it with the double data. If the data dictionary cannot be separated from the deflated data/stored separately, then it seems that there would be little value in building a custom singular dictionary as each compressed block would have its own dictionary anyway. Is this right?