0
votes

I have a buffer (string) that is growing over time, and I need to send this buffer though a channel with a limited input size (4096 bytes). The communication through this channel is costly, this is why it is better to send compressed data. The growing of the buffer happens by blocks of different size. These blocks cannot be split or the meaning is lost.

I am actually using zlib in c++ for compression with an abitrary buffer size limit. When this limit is reached, the string is compressed and sent thought the channel. This works but it is not optimal because the limit is rather low for not loosing information (channel input limit of 4096 bytes).

My idea is to use zlib for building a growing compressed buffer with compression blocks of different size and to stop the process before reaching the channel input limit. Does zlib allow to work with compression blocks of different size or I need another algorithm ?

3
No idea about zlib really, but have a look at LZMA, which I think could handle your situation. 7-zip.org/sdk.html - antipattern

3 Answers

1
votes

The easiest solution is to convert the out-of-band packet delineation into an in-band format. By far the easiest way to do this is when your input blocks do not use all 256 possible bytes. E.g. when the value 00 doesn't occur in blocks, it can be used to separate blocks prior to compression. Otherwise, you'll need an escape code.

Either way, you compress the continuous stream with block separator. On the receiving side you decompress the stream, recognize the separators, and reassemble the blocks.

1
votes

You can simply do continuous zlib compression, sending data on your channel every time 4K of compressed data has been generated. On the other end you need to assure that the decompressor is fed the 4K blocks of compressed data in the correct order.

The deflate algorithm in zlib is bursty, accumulating on the order of 16K to 64K or more of data internally before emitting any compressed data, and then delivering a block of compressed data, and then accumulating again. So there will be a latency unless you request that deflate flush data. You can have smaller blocks by flushing, with some small impact on compression, if you would like to reduce the latency.

0
votes

I succeed to design a compressor that send the growing buffer part by part through the channel with a limited input size. I put here the answer for anyone working on the same problem. Thx to Mark Adler and to MSalters for leading me to the right path.

class zStreamManager {
    public:
        zStreamManager();
        ~zStreamManager();
        void endStream();
        void addToStream(const void *inData, size_t inDataSize);

    private:
        // Size of base64 encoded is about 4*originalSize/3 + (3 to 6)
        // so with maximum output size of 4096, 3050 max zipped out
        // buffer will be fine 
        const size_t CHUNK_IN = 1024, CHUNK_OUT = 3050; 
        const std::string base64Chars = 
         "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
         "abcdefghijklmnopqrstuvwxyz"
         "0123456789+/";
        bool deallocated = true;
        z_stream stream;
        std::vector<uint8_t> outBuffer;
        std::string base64Encode(std::vector<uint8_t> &str);
};

zStreamManager::~zStreamManager() {
    endStream();
}

void zStreamManager::endStream() {
    if(!deallocated) {
        deallocated = true; 
        uint8_t tempBuffer[CHUNK_IN];
        int response = Z_OK;
        unsigned int have;

        while(response == Z_OK) {
            if (stream.avail_out == 0) {
                outBuffer.insert(outBuffer.end(), tempBuffer, tempBuffer + CHUNK_IN);
                stream.next_out = tempBuffer;
                stream.avail_out = CHUNK_IN;
            }
            response = deflate(&stream, Z_FINISH);
        }

        have = CHUNK_IN - stream.avail_out;
        if(have)
            outBuffer.insert(outBuffer.end(), tempBuffer, tempBuffer + have);

        deflateEnd(&stream);

        if(outBuffer.size())
            SEND << outBuffer << "$";
    }
}

void zStreamManager::addToStream(const void *inData, size_t inDataSize) {
    if(deallocated) {
        deallocated = false;
        stream.zalloc = 0;
        stream.zfree = 0;
        stream.opaque = 0;
        deflateInit(&stream, 9);
    }

    std::vector<uint8_t> tempBuffer(inDataSize);
    unsigned int have;

    stream.next_in = reinterpret_cast<uint8_t *>(const_cast<void*>(inData));
    stream.avail_in = inDataSize;
    stream.next_out = &tempBuffer[0];
    stream.avail_out = inDataSize;

    while (stream.avail_in != 0) {
        deflate(&stream, Z_SYNC_FLUSH);
        if (stream.avail_out == 0) {
            outBuffer.insert(outBuffer.end(), tempBuffer.begin(), tempBuffer.begin() + inDataSize);
            stream.next_out = &tempBuffer[0];
            stream.avail_out = inDataSize;
        }
    }

    have = inDataSize - stream.avail_out;
    if(have)
        outBuffer.insert(outBuffer.end(), tempBuffer.begin(), tempBuffer.begin() + have);

    while(outBuffer.size() >= CHUNK_OUT) {
        std::vector<uint8_t> zipped;

        zipped.insert(zipped.end(), outBuffer.begin(), outBuffer.begin() + CHUNK_OUT);
        outBuffer.erase(outBuffer.begin(), outBuffer.begin() + CHUNK_OUT);

        if(zipped.size())
           SEND << zipped << "|";
    }
}

std::string zStreamManager::base64Encode(std::vector<uint8_t> &str) {
    /* ALTERED VERSION OF René Nyffenegger BASE64 CODE
   Copyright (C) 2004-2008 René Nyffenegger

   This source code is provided 'as-is', without any express or implied
   warranty. In no event will the author be held liable for any damages
   arising from the use of this software.

   Permission is granted to anyone to use this software for any purpose,
   including commercial applications, and to alter it and redistribute it
   freely, subject to the following restrictions:

   1. The origin of this source code must not be misrepresented; you must not
      claim that you wrote the original source code. If you use this source code
      in a product, an acknowledgment in the product documentation would be
      appreciated but is not required.

   2. Altered source versions must be plainly marked as such, and must not be
      misrepresented as being the original source code.

   3. This notice may not be removed or altered from any source distribution.

   René Nyffenegger [email protected]
    */
  unsigned char const* bytes_to_encode = &str[0];
  unsigned int in_len = str.size();
  std::string ret;
  int i = 0, j = 0;
  unsigned char char_array_3[3], char_array_4[4];

  while(in_len--) {
    char_array_3[i++] = *(bytes_to_encode++);
    if (i == 3) {
      char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
      char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
      char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
      char_array_4[3] = char_array_3[2] & 0x3f;

      for(i = 0; (i <4) ; i++)
        ret += base64Chars[char_array_4[i]];
      i = 0;
    }
  }

  if(i) {
    for(j = i; j < 3; j++)
      char_array_3[j] = '\0';

    char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
    char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
    char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
    char_array_4[3] = char_array_3[2] & 0x3f;

    for(j = 0; (j < i + 1); j++)
      ret += base64Chars[char_array_4[j]];

    while((i++ < 3))
      ret += '=';
  }

  return ret;
}

A use case:

zStreamManager zm;
string growingBuffer = "";
bool somethingToSend = true;

while(somethingToSend) {
  RECEIVE(&growingBuffer);
  if(growingBuffer.size()) {
    zm.addToStream(growingBuffer.c_str(), growingBuffer.size());
    growingBuffer.clear();
  } else {
    somethingToSend = false;
  }
}

zm.endStream();

With RECEIVE and SEND, the methods used for receiving the buffer and sending it through the channel. For uncompressing each part are separated by the '|' character and the end of the whole buffer is delimited with '$'. Each part must be base64 decoded, then concatenated. At last it can be uncompressed with zlib like any other compressed data.