Why does training succeed and predict fail with keras and tensorflow?

Question

I train a CNN to detect the presence of hazard ids (example) on images.

I get a 99% accuracy in training. Then I try to predict on pictures from the training set and it fails to give the correct estimation.

Can you please tell me how this is possible?

Code taken from Google's cats and dogs example (https://developers.google.com/machine-learning/practica/image-classification/exercise-1):

#!/usr/bin/env python3

import matplotlib.pyplot as plt
from tensorflow.keras import layers
from tensorflow.keras import Model
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np
import os
import cv2

def predict(frame):
    frame = cv2.resize(frame, (150, 150))
    frame = np.expand_dims(frame, axis=0)
    frame = np.asarray(frame, dtype='int32')
    frame = frame / 255
    return model.predict(frame)

base_dir = './cats_and_dogs_filtered'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')

train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

train_cat_fnames = os.listdir(train_cats_dir)
print(train_cat_fnames[:10])

train_dog_fnames = os.listdir(train_dogs_dir)
train_dog_fnames.sort()
print(train_dog_fnames[:10])

img_input = layers.Input(shape=(150, 150, 3))
x = layers.Conv2D(16, 3, activation='relu')(img_input)
x = layers.MaxPooling2D(2)(x)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.MaxPooling2D(2)(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D(2)(x)
x = layers.Flatten()(x)
x = layers.Dense(512, activation='relu')(x)
output = layers.Dense(1, activation='sigmoid')(x)

model = Model(img_input, output)

model.compile(loss='binary_crossentropy',
              optimizer=RMSprop(lr=0.001),
              metrics=['acc'])

train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        train_dir,  # This is the source directory for training images
        target_size=(150, 150),  # All images will be resized to 150x150
        batch_size=20,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        validation_dir,
        target_size=(150, 150),
        batch_size=20,
        class_mode='binary')

history = model.fit_generator(
      train_generator,
      steps_per_epoch=100,  # 2000 images = batch_size * steps
      epochs=15,
      validation_data=validation_generator,
      validation_steps=50,  # 1000 images = batch_size * steps
      verbose=2)

#model.save("trained_cnn_cats_and_dogs.h5")

for i in range(100):
    frame_cat = cv2.imread(train_cats_dir + "/" + train_cat_fnames[i])
    predictions_cat = predict(frame_cat)

    frame_dog = cv2.imread(train_dogs_dir + "/" + train_dog_fnames[i])
    predictions_dog = predict(frame_dog)

    print("Dog: " + str(predictions_dog[0]) + " Cat: " + str(predictions_cat[0]))

For the dataset I have copied the dogs over the cats (so that both image sets are the same) and then put a hazard id over each former cat picture: link to the data (dropbox)

So I have "dogs without a hazard id" in the dogs directory and "dogs with hazard id sign" in the cats directory.

Output:

Epoch 15/15
 - 14s - loss: 0.0398 - acc: 0.9910 - val_loss: 0.1332 - val_acc: 0.9760
Dog: [0.9991505] Cat: [0.9996587]
Dog: [0.9996618] Cat: [0.9988152]
Dog: [0.99470115] Cat: [0.99987006]

So I can't tell if there is a hazard sign or not: Dog would mean "no hazard sign", cat would mean "hazard sign".

The same code works perfectly with the original cats and dogs data.

i provided an answer, but i deleted it, after re-reading the question i understood what you did, so the answer was unrelated — Stormsson

lthp lthp · Accepted Answer · 2018-07-17T15:11:02

The problem was in the byte order of cv2.imread() which is Blue-Green-Red whereas ImageDataGenerator uses Red-Green-Blue.

Solution:

def prepare_frame(frame):
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # <- correct byte order!!!
    frame = cv2.resize(frame, (150, 150))
    frame = np.expand_dims(frame, axis=0)
    frame = np.asarray(frame, dtype='int32')
    frame = frame / 255
    return frame

Original cats & dogs dataset: still works (I guess the colors just had no impact there - very interesting!) Batch Size: tested with 1 and 20, both work

Thanks for all your comments - I really appreciate your help! I hope somebody else finds this useful.

Why does training succeed and predict fail with keras and tensorflow?

3 Answers