0
votes

I have a (time series) data that looks like {21,21,22,23,24,23,....} and I'm trying to implement the jpeg algorithm in JAVA to see how the jpeg compression works on such 1D data (my plan is to compare all sorts of compression algorithms).

I know that using JPEG on text data (or in general lossy compression in text data) doesn't make much sense but my goal is to see what kind of intermediate patterns are generated (e.g. "automobile" becomes "qwses")and how much of that intermediate pattern resembles the original words over increasing compression rate. So the idea is something like this: https://www.youtube.com/watch?v=meovx9OqWJc&t=1s

My input file, as I said above, is a 1x458 matrix and contains numbers between 10 and 300; e.g. {10, 13, 14, 14, 15, 12, ...., 247,247,249,.., 284, 283}

My main problem is that I am not quite sure how I should theoretically convert the JPEG algorithm (8x8 image multiplied with 8x8 DCT coefficient matrix etc is adapted for a 1xN data (a line) and what part of the implementation should I change in JPEG (e.g. how should I have dct coefficients for such data, etc.)? If someone can explain it with a pseudo code, that would also be really nice.

1
So, it seems that you are really just trying to come up with a new compression algorithm? From what I can tell, JPEG implementations will try and pad images that do not have 8 x 8 (or whatever the block size is) blocks evenly.Tyler Nichols
Is it possible to convert 1D -> 2D by stacking up subsets and then splitting them up to get result?Tyler Nichols
I am not sure if folding the data after the 8th column (to create blocks that are the power of 8) would affect the accuracy of the compression since the data doesn't have any such property. I was thinking to maybe create 2D version of the data by adding just 0s to create 8 more rows. I guess multiplying with 0s would not particularly effect the performance. Am I making any sense?Chuckster

1 Answers

0
votes

You'd need to provide more information about the nature of your 1D data, and why you think it is compressible. What patterns are you expecting to see?

For example, if it really looks like the sequence you show (21,21,22,23,24,23), then a simple model using the difference of successive values would result in data that is highly compressible by a standard lossless compressors like gzip. E.g. (21,0,1,1,1,-1).

To take advantage of higher-order correlations, what you might be looking for is an FFT. You can do an FFT efficiently on any sequence of 2n samples (not only eight samples, as the JPEG DCT does). There are libraries out there that do lossless integer FFTs, as well as other transforms like wavelets, that you can try.