1
votes

I am trying to implement an LSB embedding steganalysis algorithm.

The database of images consists of 24-bit bmp color images (a few thousand). Almost all steganalysis of LSB embedding steganography research work focuses on grayscale images. I want to try to use inter-color correlations within the image for hidden message presence detection. Not just as three time grayscale.

I searched works on this topic and found a very simple algorithm (I can't provide the link). It uses data compression for detecting hidden messages.

In short:

It states that the more information is hidden in the file, the greater the size of its archive (for data compression methods such as rar, nanzip, png etc), because data compression algorithms use inter-color correlations.

There was no proof link for the statements. I don't know how data compression algorithms work, intuitively I agree with those statements but I want to know for sure is it true or not, for example, for gzip or zip algorithms.


1
If the work you found with the algorithm comes from a paper, is it possible to provide the title of the article and its authors?Reti43
A New Compression-Based Method for Estimating LSB replacement Rate in Color and Grayscale Imagesbaira

1 Answers

0
votes

I don't know how any of these compression algoritms work, but I'll take a general stab at it.

Compression works by removing redundancy and generally uncompressed images have lots of it, because neighbouring pixels are correlated. That is, the colour in an area of an image tends to be similar because colour changes are slow and smooth.

However, the bit stream you embed tends to have no pattern and represents more of a random distribution of 1s and 0s. Random/noisy data have fewer patterns to take advantage of for compression. So, while an image can be compressed quite nicely, the compression won't reduce the file size as much after you hide your information, i.e., make the pixels more random.

To demonstrate this let's assume a basic and probably naive compression algorithm which detects whether you have a pixel value, P, repeated N time, and stores it as N, P. For example:

200, 200, 200, 200, 200, 200, 200, 201, 201, 201, 201     # uncompressed
7, 200, 4, 201                                            # compressed

Now, embed 1 bit in each of these pixels. The probability of switching the bit from 0 to 1 or vice versa is 50%, so approximately half of them will change. You could end up with something like this:

200, 200, 201, 200, 201, 201, 200, 200, 201, 200, 201          # uncompressed
2, 200, 1, 201, 1, 200, 2, 201, 2, 200, 1, 201, 1, 200, 1, 201 # compressed

Notice how much more data to the original case are required for compression. For comparison, embed 1 bit in only the first two pixels.

201, 200, 200, 200, 200, 200, 200, 201, 201, 201, 201     # uncompressed
1, 201, 6, 200, 4, 201                                    # compressed