0
votes

I have a compressed file in the disk, that a partitioned in blocks. I read a block from disk decompress it to memory and the read the data.

It is possible to create a producer/consumer, one thread that recovers compacted blocks from disk and put in a queue and another thread that decompress and read the data?

Will the performance be better?

Thanks!

2

2 Answers

1
votes

I suspect that the thread that decompresses the data would spend most of its time waiting for the thread that reads the compacted blocks from the disk.

I'd be surprised if the CPU-bound decompression took longer than the IO-bound reading the blocks from disk.

0
votes

Yes, it's possible to set it up that way. Whether you would see a performance improvement is wildly dependent on the machine, the exact nature of what you're doing with the decompressed data, etc. If it's not too much trouble, and your dataset is substantial, I'd suggest doing it and measuring to see if it's faster. If nothing else, it's similar to the work you'd need to do to leverage some sort of map-reduce framework.