0
votes

I have an assignment where we need to write a (very simplified) multithreaded compression program in java (using the built in GzipOutputStream is fine). I understand mostly how to do everything else, but there's one part of the instructions that specifies that we have to dividing the input stream into 128kB blocks, and each thread will work on compressing one of those blocks.

I think I'm missing something glaringly obvious, but at the moment the only way I can think of to split up the input stream is using the IO InpuStream's read() command (which reads one byte at a time) and manually counting up to 128 or if we reach the end of the file first. But that seems horribly inefficient.

Another thing, which is somewhat off tangent and I'm still busily googling to try and figure out, is that it says we're supposed to use the last 32kB of each previous block to prime the compression dictionary for the next block. I vaguely understand what that means, although I'm not entirely sure how to implement it or if it affects how I should treat the input stream byte by byte.

EDIT: To specify, unless I'm thinking of a different dictionary, the deflater class can take in an int that sets the compression level (1 to 9) but... I'm not sure how to get that to correspond to 32kB, unless I have to write a new dictionary altogether. We're only doing compression, so there's no need for the inflater class or such.

1

1 Answers

0
votes

I think I'm missing something glaringly obvious, but at the moment the only way I can think of to split up the input stream is using the IO InpuStream's read() command (which reads one byte at a time) and manually counting up to 128 or if we reach the end of the file first. But that seems horribly inefficient.

Yea. That would be a terrible solution.

You need to either use BufferedInputStream or InputStream.read(byte[], int, int). The File.length() method could also prove useful ... if the input is a file.

Since this is homework, I'll leave you to work out the details.