I recently switched out using cv2 for Tensorflow's tf.image module for image processing. However, my validation accuracy dropped around 10%.
I believe the issue is related to
- cv2.imread() vs. tf.image.decode_jpeg()
- cv2.resize() vs. tf.image.resize_images()
While these differences result in worse accuracy, the images seem to be human-indistinguishable when using plt.imshow(). For example, take Image #1 of the ImageNet Validation Dataset:
First issue:
- cv2.imread() takes in a string and outputs a BGR 3-channel uint8 matrix
- tf.image_decode_jpeg() takes in a string tensor and outputs an RGB 3-channel uint8 tensor.
However, after converting the tf tensor to BGR format, there are very slight differences at many pixels in the image.
Using tf.image.decode_jpeg and then converting to BGR
[[ 26 41 24 ..., 57 48 46]
[ 36 39 36 ..., 24 24 29]
[ 41 26 34 ..., 11 17 27]
...,
[ 71 67 61 ..., 106 105 100]
[ 66 63 59 ..., 106 105 101]
[ 64 66 58 ..., 106 105 101]]```
Using cv.imread
[[ 26 42 24 ..., 57 48 48]
[ 38 40 38 ..., 26 27 31]
[ 41 28 36 ..., 14 20 31]
...,
[ 72 67 60 ..., 108 105 102]
[ 65 63 58 ..., 107 107 103]
[ 65 67 60 ..., 108 106 102]]```
Second issue:
- tf.image.resize_images() automatically converts a uint8 tensor to a float32 tensor, and seems to exacerbate the differences in pixel values.
- I believe that tf.image.resize_images() and cv2.resize() are both
tf.image.resize_images
[[ 26. 25.41850281 35.73127747 ..., 81.85855103
59.45834351 49.82373047]
[ 38.33480072 32.90485001 50.90826797 ..., 86.28446198
74.88543701 20.16353798]
[ 51.27312469 26.86172867 39.52401352 ..., 66.86851501
81.12111664 33.37636185]
...,
[ 70.59472656 75.78851318
45.48100662 ..., 70.18637085
88.56777191 97.19295502]
[ 70.66964722 59.77249908 48.16699219 ..., 74.25527954
97.58244324 105.20263672]
[ 64.93395996 59.72298431 55.17600632 ..., 77.28720856
98.95108032 105.20263672]]```
cv2.resize
[[ 36 30 34 ..., 102 59 43]
[ 35 28 51 ..., 85 61 26]
[ 28 39 50 ..., 59 62 52]
...,
[ 75 67 34 ..., 74 98 101]
[ 67 59 43 ..., 86 102 104]
[ 66 65 48 ..., 86 103 105]]```
Here's a gist demonstrating the behavior just mentioned. It includes the full code for how I am processing the image.
So my main questions are:
- Why is the output of cv2.imread() and tf.image.decode_jpeg() different?
- How are cv2.resize() and tf.image.resize_images() different if they use the same interpolation scheme?
Thank you!
tf.image.decode_jpeg
has two decoding options: Faster and accurate. Faster doesn't conform to jpeg specification. So, setting to "INTEGER_ACCURATE" indct_mode
should make both same. I don't think there is any standard defined for image interpolation schemes, so everyone has their own way of doing this. – vijay mcv2
andtf
have different bilinear interp results? – Alexcv
ouput match that oftensorflow
? – Effective_cellist