3
votes

I want to compress/decompress and serialize/deserialize String content. I'm using the following two static functions.

/**
 * Compress data based on the {@link Deflater}.
 * 
 * @param pToCompress
 *            input byte-array
 * @return compressed byte-array
 * @throws NullPointerException
 *             if {@code pToCompress} is {@code null}
 */
public static byte[] compress(@Nonnull final byte[] pToCompress) {
    checkNotNull(pToCompress);

    // Compressed result.
    byte[] compressed = new byte[] {};

    // Create the compressor.
    final Deflater compressor = new Deflater();
    compressor.setLevel(Deflater.BEST_SPEED);

    // Give the compressor the data to compress.
    compressor.setInput(pToCompress);
    compressor.finish();

    /*
     * Create an expandable byte array to hold the compressed data.
     * You cannot use an array that's the same size as the orginal because
     * there is no guarantee that the compressed data will be smaller than
     * the uncompressed data.
     */
    try (ByteArrayOutputStream bos = new ByteArrayOutputStream(pToCompress.length)) {
        // Compress the data.
        final byte[] buf = new byte[1024];
        while (!compressor.finished()) {
            final int count = compressor.deflate(buf);
            bos.write(buf, 0, count);
        }

        // Get the compressed data.
        compressed = bos.toByteArray();
    } catch (final IOException e) {
        LOGWRAPPER.error(e.getMessage(), e);
        throw new RuntimeException(e);
    }


    return compressed;
}

/**
 * Decompress data based on the {@link Inflater}.
 * 
 * @param pCompressed
 *            input string
 * @return compressed byte-array
 * @throws NullPointerException
 *             if {@code pCompressed} is {@code null}
 */
public static byte[] decompress(@Nonnull final byte[] pCompressed) {
    checkNotNull(pCompressed);

    // Create the decompressor and give it the data to compress.
    final Inflater decompressor = new Inflater();
    decompressor.setInput(pCompressed);

    byte[] decompressed = new byte[] {};

    // Create an expandable byte array to hold the decompressed data.
    try (final ByteArrayOutputStream bos = new ByteArrayOutputStream(pCompressed.length)) {
        // Decompress the data.
        final byte[] buf = new byte[1024];
        while (!decompressor.finished()) {
            try {
                final int count = decompressor.inflate(buf);
                bos.write(buf, 0, count);
            } catch (final DataFormatException e) {
                LOGWRAPPER.error(e.getMessage(), e);
                throw new RuntimeException(e);
            }
        }
        // Get the decompressed data.
        decompressed = bos.toByteArray();
    } catch (final IOException e) {
        LOGWRAPPER.error(e.getMessage(), e);
    }

    return decompressed;
}

Yet, compared to non-compressed values it's orders of magnitudes slower even if I'm caching the decompressed-result and the values are only decompressed if the content is really needed.

That is, it's used for a DOM-like persistable tree-structure and XPath-queries which force the decompression of the String-values are about 50 times if not even more slower (not really benchmarked, just executed unit tests). My laptop even freezes after some unit tests (everytime, checked it about 5-times), because Eclipse isn't responding anymore due to heavy disk I/O and what not. I've even set the compression level to Deflater.BEST_SPEED, whereas other compression levels might be better, maybe I'm providing a configuration option parameter which can be set for resources. Maybe I've messed something up as I haven't used the deflater before. I'm even only compressing content where the String lenght is > 10.

Edit: After considering to extract the Deflater instantiation to a static field it seems creating an instance of deflater and inflater is very costly as the performance bottleneck is gone and perhaps without microbenchmarks or the like I can't see any performance loss :-) I'm just resetting the deflater/inflater before using a new input.

2
How are you measuring the performance of your compression and decompression algorithms? Could you please share your idea/code regarding this? Are you calculating compression ratio and compression time of your algorithm? if so, could you please share the experience with me? I have implemented compression and decompression algorithm for random strings and now I am in need of measuring the performance of my algorithms by calculating compression ratio, compression time etc. Any kind of help will be appreciated. Thanks!K M Rakibul Islam

2 Answers

2
votes

How you considered using the higher level api like Gzip.

Here is an example for compressing:

public static byte[] compressToByte(final String data, final String encoding)
    throws IOException
{
    if (data == null || data.length == 0)
    {
        return null;
    }
    else
    {
        byte[] bytes = data.getBytes(encoding);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        GZIPOutputStream os = new GZIPOutputStream(baos);
        os.write(bytes, 0, bytes.length);
        os.close();
        byte[] result = baos.toByteArray();
        return result;
    }
}

Here is an example for uncompressing:

public static String unCompressString(final byte[] data, final String encoding)
    throws IOException
{
    if (data == null || data.length == 0)
    {
        return null;
    }
    else
    {
        ByteArrayInputStream bais = new ByteArrayInputStream(data);
        ByteArrayOutputStream buffer = new ByteArrayOutputStream();
        GZIPInputStream is = new GZIPInputStream(bais);
        byte[] tmp = new byte[256];
        while (true)
        {
            int r = is.read(tmp);
            if (r < 0)
            {
                break;
            }
            buffer.write(tmp, 0, r);
        }
        is.close();

        byte[] content = buffer.toByteArray();
        return new String(content, 0, content.length, encoding);
    }
}

We get very good performance and compression ratio with this.

The zip api is also an option.

0
votes

Your comments are the correct answer.

In general, if a method is going to be used frequently, you want to eliminate any allocations and copying of data. This often means removing instance initialization and other setup to either static variables or to the constructor.

Using statics is easier, but you may run into lifetime issues (as in how do you know when to clean up the statics - do they exist forever?).

Doing the setup and initialization in the constructor allows the user of the class to determine the lifetime of the object and clean up appropriately. You could instantiate it once before going into a processing loop and GC it after exiting.