I think I'm missing something very simple. I have a byte array holding deflated data written into it using a Deflater:
deflate(outData, 0, BLOCK_SIZE, SYNC_FLUSH)
The reason I didn't just use GZIPOutputStream was because there were 4 threads (variable) that each were given a block of data and each thread compressed it's own block before storing that compressed data into a global byte array. If I used GZIPOutputStream it messes up the format because each little block has a header and trailer and is it's own gzip data (I only want to compress it).
So in the end, I've got this byteArray, outData, that's holding all of my compressed data but I'm not really sure how to wrap it. GZIPOutputStream writes from an buffer with uncompressed data, but this array is all set. It's already compressed and I'm just hitting a wall trying to figure out how to get it into a form.
EDIT: Ok, bad wording on my part. I'm writing it to output, not a file, so that it could be redirected if needed. A really simple example is that
cat file.txt | java Jzip | gzip -d | cmp file.txt
should return 0. The problem right now is if I write this byte array as is to output, it's just "raw" compressed data. I think gzip needs all this extra information.
If there's an alternative method, that would be fine to. The whole reason it's like this is because I needed to use multiple threads. Otherwise I would just call GZIPOutputStream.
DOUBLE EDIT: Since the comments provide a lot of good insight, another method is that I just have a bunch of uncompressed blocks of data that were originally one long stream. If gzip can read concatenated streams, if I took those blocks (and kept them in order) and gave each one to a thread that calls GZIPOutputStream on its own block, then took the results and concatenated them. In essence, each block now has header, the compressed info, and trailer. Would gzip recognize that if I concatenated them?
Example:
cat file.txt
Hello world! How are you? I'm ready to set fire to this assignment.
java Testcase < file.txt > file.txt.gz
So I accept it from input. Inside the program, the stream is split up into "Hello world!" "How are you?" "I'm ready to set fire to this assignment" (they're not strings, it's just an array of bytes! this is just illustration)
So I've got these three blocks of bytes, all uncompressed. I give each of these blocks to a thread, which uses
public static class DGZIPOutputStream extends GZIPOutputStream
{
public DGZIPOutputStream(OutputStream out, boolean flush) throws IOException
{
super(out, flush);
}
public void setDictionary(byte[] b)
{
def.setDictionary(b);
}
public void updateCRC(byte[] input)
{
crc.update(input);
}
}
As you can see, the only thing here is that I've set the flush to SYNC_FLUSH so I can get the alignment right and have the ability to set the dictionary. If each thread were to use DGZIPOutputStream (which I've tested and it works for one long continuous input), and I concatenated those three blocks (now compressed each with a header and trailer), would gzip -d file.txt.gz work?
If that's too weird, ignore the dictionary completely. It doesn't really matter. I just added it in while I was at it.