14
votes

I recently switched out using cv2 for Tensorflow's tf.image module for image processing. However, my validation accuracy dropped around 10%.

I believe the issue is related to

  1. cv2.imread() vs. tf.image.decode_jpeg()
  2. cv2.resize() vs. tf.image.resize_images()

While these differences result in worse accuracy, the images seem to be human-indistinguishable when using plt.imshow(). For example, take Image #1 of the ImageNet Validation Dataset:

CV2 Image enter image description here

First issue:

  • cv2.imread() takes in a string and outputs a BGR 3-channel uint8 matrix
  • tf.image_decode_jpeg() takes in a string tensor and outputs an RGB 3-channel uint8 tensor.

However, after converting the tf tensor to BGR format, there are very slight differences at many pixels in the image.

Using tf.image.decode_jpeg and then converting to BGR

[[ 26  41  24 ...,  57  48  46]
 [ 36  39  36 ...,  24  24  29]
 [ 41  26  34 ...,  11  17  27]
 ..., 
 [ 71  67  61 ..., 106 105 100]
 [ 66  63  59 ..., 106 105 101]
 [ 64  66  58 ..., 106 105 101]]```

Using cv.imread

[[ 26  42  24 ...,  57  48  48]
 [ 38  40  38 ...,  26  27  31]
 [ 41  28  36 ...,  14  20  31]
 ..., 
 [ 72  67  60 ..., 108 105 102]
 [ 65  63  58 ..., 107 107 103]
 [ 65  67  60 ..., 108 106 102]]```

Second issue:

  • tf.image.resize_images() automatically converts a uint8 tensor to a float32 tensor, and seems to exacerbate the differences in pixel values.
  • I believe that tf.image.resize_images() and cv2.resize() are both

tf.image.resize_images

[[  26.           25.41850281   35.73127747 ...,   81.85855103
    59.45834351   49.82373047]
 [  38.33480072   32.90485001   50.90826797 ...,   86.28446198
    74.88543701   20.16353798]
 [  51.27312469   26.86172867   39.52401352 ...,   66.86851501
    81.12111664   33.37636185]
 ..., 
 [  70.59472656   75.78851318 
 45.48100662 ...,   70.18637085
    88.56777191   97.19295502]
 [  70.66964722   59.77249908   48.16699219 ...,   74.25527954
    97.58244324  105.20263672]
 [  64.93395996   59.72298431   55.17600632 ...,   77.28720856
    98.95108032  105.20263672]]```

cv2.resize

[[ 36  30  34 ..., 102  59  43]
 [ 35  28  51 ...,  85  61  26]
 [ 28  39  50 ...,  59  62  52]
 ..., 
 [ 75  67  34 ...,  74  98 101]
 [ 67  59  43 ...,  86 102 104]
 [ 66  65  48 ...,  86 103 105]]```

Here's a gist demonstrating the behavior just mentioned. It includes the full code for how I am processing the image.

So my main questions are:

  • Why is the output of cv2.imread() and tf.image.decode_jpeg() different?
  • How are cv2.resize() and tf.image.resize_images() different if they use the same interpolation scheme?

Thank you!

1
I don't think BGR format should look like this. Rather than that, I would have expected 3 values tuples (well, arrays) placed in a 2 dimensionnal array (doing a 3D array once put all together). Can you try printing the array again along with its shape, for each functions, and show us the result ?Alceste_
tf.image.decode_jpeg has two decoding options: Faster and accurate. Faster doesn't conform to jpeg specification. So, setting to "INTEGER_ACCURATE" in dct_mode should make both same. I don't think there is any standard defined for image interpolation schemes, so everyone has their own way of doing this.vijay m
Did you ever figure out why cv2 and tfhave different bilinear interp results?Alex
Did you figure out a way to make cv ouput match that of tensorflow?Effective_cellist
Unfortunately not. After fixing other bugs in my code, I was able to handle the slight differences between cv2 and tf output.txizzle

1 Answers

8
votes

As vijay m points out correctly, by changing the dct_method to "INTEGER_ACCURATE" you will get the same uint8 image using cv2 or tf. The problem indeed seems to be the resizing method. I also tried to force Tensorflow to use the same interpolation method than cv2 uses by default (bilinear) but the results are still different. This might be the case, because cv2 does the interpolation on integer values and TensorFlow converts to float before interpolating. But this is only a guess. If you plot the pixel-wise difference between the resized image by TF and cv2 you'll get the following historgram:

Histrogramm of pixel-wise difference

As you can see, this looks pretty normal distributed. (Also I was surprised amount of pixel-wise difference). The problem of your accuracy drop could lie exactly here. In this paper Goodfellow et al. describe the effect of adversarial examples and classification systems. This problem here is something similar I think. If the original weights you use for your network were trained using some input pipeline which gives the results of the cv2 functions, the image from the TF input pipeline is something like an adversarial example.

(See the image on page 3 at the top for an example...I can't post more than two links.)

So in the end I think if you want to use the original network weights for the same data they trained the network on, you should stay with a similar/same input pipeline. If you use the weights to finetune the network on your own data, this should not be of a big concern, because you retrain the classification layer to work with the new input images (from the TF pipeline).

And @ Ishant Mrinal: Please have a look at the code the OP provided in the GIST. He is aware of the difference of BGR (cv2) and RGB (TF) and is converting the images to the same color space.