Parallelizeable jpeg like compression using only DCT, run length encoding stages, what sort of compression/performance possible?

Question

We have to compress a ton o' (monochrome) image data and move it quickly. If one were to just use the parallelizeable stages of jpeg compression (DCT and run length encoding of the quantized results) and run it on a GPU so each block is compressed in parallel I am hoping that would be very fast and still yeild a very significant compression factor like full jpeg does.

Does anyone with more GPU / image compression experience have any idea how this would compare both compression and performance wise over using libjpeg on a CPU? (If it is a stupid idea, feel free to say so - I am extremely novice in my knowledge of cuda and the various stages of jpeg compression.) Certainly it will be less compression and hopefully(?) faster but I have no idea how significant those factors may be.

Some additional things to consider: What format is the data currently? Can you accept lossy compression? Is the data photo-like or document-like? — jeff7
I can't give you any implementation details, but I know that my 9-year-old video camera was able to do full color 640x480 DCT compression at 30 frames a second. Recently announced DSLRs can do 1920x1080, 24 frames a second motion JPEG. You should be able to achieve similar. — Mark Ransom
Does it have to be jpeg based? Simple LZ style compressors could probably do a decent job and be coded on a CPU or perhaps even in a GPU shader with a little work. — Michael Dorgan
@jeff7 No, its not document-like or I wouldn't use DCT. I can accept some lossy-ness. I like that the jpeg-like strategy offers control over the amount of lossiness through the choice of quantization matrix (take more loss => get greater compression, take less loss => get less compression). — user334911
@Michael No it doesn't have to be jpeg based, but it needs a significant compression ratio as the reason we want to do compression is to significantly reduce the amount of data transferred. From what I know about LZ compression I wouldn't expect that on raw image data, but am I mistaken? — user334911

BarsMonster BarsMonster · Accepted Answer · 2010-09-28T15:33:41

You could hardly get more compression in GPU - there are just no complex-enough algorithms which can use that MUCH power.

When working with simple alos like JPEG - it's so simple that you'll spend most of the time transferring data via PCI-E bus (which has significant latency, especially when card does not support DMA transfers).

Positive side is that if card have DMA, you can free up CPU for more important stuff, and get image compression "for free".

In the best case, you can get about 10x improvement on top-end GPU compared to top-end CPU provided that both CPU & GPU code is well-optimized.

Parallelizeable jpeg like compression using only DCT, run length encoding stages, what sort of compression/performance possible?

1 Answers