Issue: Model Classification cats and dogs (keras)

Question

Read this text only after reading everything else: "check the second model at the end.There i try to classify cars and planes from cifar dataset: that model is reproducible"

I am trying to build a model that classifies cats and dogs, something that should not be a real problem. So, here what I am doing:
I created a folder with two labeled subfolders: cats and dogs. In each folder I have 1000 image of cats/dogs.
I built iteratively an numpy array in which I put my images after converting them to arrays ( I have chosen (200,200,3) size for each image) and I have up-scaled the array by dividing it by 255. So I got a (2000,200,200,3) up-scaled array.
Then, I create an array for the labels. Since I have two categories, I will have 2 biniary digits for each row of the array: (1,0) if cat and (0,1) if a dog. So I found myself with a (2000,2) array of labels.
Next, I create X_train,Y_train and X_valid,Y_valid ( 70% for train and 30% for valid).
Then I create a neural network with this architecture:
Dense(200x200x3,1000,relu)>>> Dense(1000,500,sigmoid)>>>Dense(500,100,sigmoid)>>>Dense(100,2,softmax) : BACKPROP:loss=categorical_crossentropy,optimizer=adam.

Till now, everything looks fine, the model is trained. But then, when I try to predict values: the model always gives back the same value whatever is the input ( even if i try to predict elements with the training set , i always get the same constant output= array([[0.5188029 , 0.48119715]] )

I do really need help, i don't know why this is happening.. So in order to guide you guys I'll write down all the code corresponding to what I did:

Importing libraries function: preprocess_image

def preprocess_image(img_path, model_image_size):
    image_type = imghdr.what(img_path)
    image = Image.open(img_path)
    resized_image = image.resize(tuple(reversed(model_image_size)), Image.BICUBIC)
    image_data = np.array(resized_image, dtype='float32')
    image_data /= 255.
    image_data = np.expand_dims(image_data, 0)  # Add batch dimension.
    return image, image_data

###################################### import libraries ##########################################################

import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.preprocessing import image
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model
from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body


from keras.models import Sequential
from scipy.misc import imread
get_ipython().magic('matplotlib inline')
import matplotlib.pyplot as plt
import numpy as np
import keras
from keras.layers import Dense
import pandas as pd

%matplotlib inline

Importing the images

train_img=[]
for i in range(1000,2000):
    (img, train_img_data)=preprocess_image('kaggle/PetImages/Cat/'+str(i)+'.jpg',model_image_size = (200, 200))
    train_img.append(train_img_data)
for i in range(1000):
    (img, train_img_data)=preprocess_image('kaggle/PetImages/Dog/'+str(i)+'.jpg',model_image_size = (200, 200))
    train_img.append(train_img_data)
train_img= np.array(train_img).reshape(2000,200,200,3)

Creating the training and validation sets

x_train= train_img.reshape(train_img.shape[0],-1)
y_train = np.zeros((2000, 2))
for i in range(1000):
    y_train[i,0]=1
for i in range(1000,2000):
    y_train[i,1]=1
from sklearn.model_selection import train_test_split
X_train, X_valid, Y_train, Y_valid=train_test_split(x_train,y_train,test_size=0.3, random_state=42)

Creating the model structure (using keras)

from keras.layers import Dense, Activation
model=Sequential()
model.add(Dense(1000, input_dim=200*200*3, activation='relu',kernel_initializer='uniform'))
keras.layers.core.Dropout(0.3, noise_shape=None, seed=None)
model.add(Dense(500,input_dim=1000,activation='sigmoid'))
keras.layers.core.Dropout(0.4, noise_shape=None, seed=None)
model.add(Dense(150,input_dim=500,activation='sigmoid'))
keras.layers.core.Dropout(0.2, noise_shape=None, seed=None)
model.add(Dense(units=2))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer="adam", metrics=['accuracy'])
# fitting the model 
model.fit(X_train, Y_train, epochs=20, batch_size=50,validation_data=(X_valid,Y_valid))

fitting the model

model.fit(X_train, Y_train, epochs=20, batch_size=50,validation_data=(X_valid,Y_valid))

Output of fitting the model

Train on 1400 samples, validate on 600 samples
Epoch 1/20
1400/1400 [==============================] - 73s 52ms/step - loss: 0.8065 - acc: 0.4814 - val_loss: 0.6939 - val_acc: 0.5033
Epoch 2/20
1400/1400 [==============================] - 72s 51ms/step - loss: 0.7166 - acc: 0.5043 - val_loss: 0.7023 - val_acc: 0.4967
Epoch 3/20
1400/1400 [==============================] - 73s 52ms/step - loss: 0.6969 - acc: 0.5214 - val_loss: 0.6966 - val_acc: 0.4967
Epoch 4/20
1400/1400 [==============================] - 71s 51ms/step - loss: 0.6986 - acc: 0.4857 - val_loss: 0.6932 - val_acc: 0.4967
Epoch 5/20
1400/1400 [==============================] - 74s 53ms/step - loss: 0.7018 - acc: 0.4686 - val_loss: 0.7080 - val_acc: 0.4967
Epoch 6/20
1400/1400 [==============================] - 76s 54ms/step - loss: 0.7041 - acc: 0.4843 - val_loss: 0.6931 - val_acc: 0.5033
Epoch 7/20
1400/1400 [==============================] - 73s 52ms/step - loss: 0.7002 - acc: 0.4771 - val_loss: 0.6973 - val_acc: 0.4967
Epoch 8/20
1400/1400 [==============================] - 70s 50ms/step - loss: 0.7039 - acc: 0.5014 - val_loss: 0.6931 - val_acc: 0.5033
Epoch 9/20
1400/1400 [==============================] - 72s 51ms/step - loss: 0.6983 - acc: 0.4971 - val_loss: 0.7109 - val_acc: 0.5033
Epoch 10/20
1400/1400 [==============================] - 72s 51ms/step - loss: 0.7063 - acc: 0.4986 - val_loss: 0.7151 - val_acc: 0.4967
Epoch 11/20
1400/1400 [==============================] - 78s 55ms/step - loss: 0.6984 - acc: 0.5043 - val_loss: 0.7026 - val_acc: 0.5033
Epoch 12/20
1400/1400 [==============================] - 78s 55ms/step - loss: 0.6993 - acc: 0.4929 - val_loss: 0.6958 - val_acc: 0.4967
Epoch 13/20
1400/1400 [==============================] - 90s 65ms/step - loss: 0.7000 - acc: 0.4843 - val_loss: 0.6970 - val_acc: 0.4967
Epoch 14/20
1400/1400 [==============================] - 78s 56ms/step - loss: 0.7052 - acc: 0.4829 - val_loss: 0.7029 - val_acc: 0.4967
Epoch 15/20
1400/1400 [==============================] - 80s 57ms/step - loss: 0.7003 - acc: 0.5014 - val_loss: 0.6993 - val_acc: 0.5033
Epoch 16/20
1400/1400 [==============================] - 77s 55ms/step - loss: 0.6933 - acc: 0.5200 - val_loss: 0.6985 - val_acc: 0.5033
Epoch 17/20
1400/1400 [==============================] - 78s 56ms/step - loss: 0.6962 - acc: 0.4871 - val_loss: 0.7086 - val_acc: 0.4967
Epoch 18/20
1400/1400 [==============================] - 81s 58ms/step - loss: 0.6987 - acc: 0.4971 - val_loss: 0.7119 - val_acc: 0.4967
Epoch 19/20
1400/1400 [==============================] - 77s 55ms/step - loss: 0.7010 - acc: 0.5171 - val_loss: 0.6969 - val_acc: 0.4967
Epoch 20/20
1400/1400 [==============================] - 74s 53ms/step - loss: 0.6984 - acc: 0.5057 - val_loss: 0.6936 - val_acc: 0.5033
<keras.callbacks.History at 0x23903fc7c88>

Prediction on elements of training set:

print(model.predict(X_train[240].reshape(1,120000)))
print(model.predict(X_train[350].reshape(1,120000)))
print(model.predict(X_train[555].reshape(1,120000)))
print(model.predict(X_train[666].reshape(1,120000)))
print(model.predict(X_train[777].reshape(1,120000)))

Output of these operations

[[0.5188029  0.48119715]]
[[0.5188029  0.48119715]]
[[0.5188029  0.48119715]]
[[0.5188029  0.48119715]]
[[0.5188029  0.48119715]]

Prediction on elements of validation set

print(model.predict(X_valid[10].reshape(1,120000)))
print(model.predict(X_valid[20].reshape(1,120000)))
print(model.predict(X_valid[30].reshape(1,120000)))
print(model.predict(X_valid[40].reshape(1,120000)))
print(model.predict(X_valid[50].reshape(1,120000)))

Output of these operations

[[0.5188029  0.48119715]]
[[0.5188029  0.48119715]]
[[0.5188029  0.48119715]]
[[0.5188029  0.48119715]]
[[0.5188029  0.48119715]]

I am really confused because I don't know why do I got this result. I also tried another classification for gender (men/women) and also got similar result, in other words I do get a fixed value output whatever is the value of the input ( basically telling me that all observations are women)....

Here is the part I am talking about in the begining of the thread:
Classifying cars and planes ( reproduciable)

#importing keras cifar
from keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()


#Building labeled arrays of cars and planes: 5000 first are plane, 5000 last are car

#Building x_train_our
a=(y_train==0)
x_plane=x_train[list(a[:,0])]
a=(y_train==1)
x_car=x_train[list(a[:,0])]
x_train_our=np.append(x_plane,x_car,axis=0)


#Building y_train_our
y_train_our = np.zeros((10000, 2))
for i in range(5000):
    y_train_our[i,0]=1
for i in range(5000,10000):
    y_train_our[i,1]=1
print('x_train_our shape: ',x_train_our.shape)
print('y_train_our shape: ',y_train_our.shape)

#Train set and valid set
x_train_our= x_train_our.reshape(x_train_our.shape[0],-1)
y_train_our=y_train_our
print('x_train_our shape: ',x_train_our.shape)
print('y_train_our shape: ',y_train_our.shape)
from sklearn.model_selection import train_test_split
X_train_our, X_valid_our, Y_train_our, Y_valid_our=train_test_split(x_train_our,y_train_our,test_size=0.3, random_state=42)

#testing typology of different elements
print("-------------testing size of different elements et toplogie: ")
print("-------------x_train_our size: ",x_train_our.shape)
print("-------------y_train_our size: ",y_train_our.shape)
print("-------------X_train_our size: ",X_train_our.shape)
print("-------------X_valid_our size: ",X_valid_our.shape)
print("-------------Y_train_our size: ",Y_train_our.shape)
print("-------------Y_valid_our size: ",Y_valid_our.shape)


#Mode1: creating a mlp model which is going to be the output for the YOLO model
from keras.layers import Dense, Activation
model=Sequential()
model.add(Dense(1000, input_dim=32*32*3, activation='relu',kernel_initializer='uniform'))
keras.layers.core.Dropout(0.3, noise_shape=None, seed=None)
model.add(Dense(500,input_dim=1000,activation='sigmoid'))
keras.layers.core.Dropout(0.4, noise_shape=None, seed=None)
model.add(Dense(150,input_dim=500,activation='sigmoid'))
keras.layers.core.Dropout(0.2, noise_shape=None, seed=None)
model.add(Dense(units=2))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer="adam", metrics=['accuracy'])

# fitting the model 

model.fit(X_train_our, Y_train_our, epochs=20, batch_size=10,validation_data=(X_valid_our,Y_valid_our))

#Build test set 
a=(y_test==0)
x_test_plane=x_test[list(a[:,0])]
a=(y_test==1)
x_test_car=x_test[list(a[:,0])]

# Test model
for i in range(1000):
    print('it should be a plane: ',model.predict(x_plane[i].reshape(1,-1)))
for i in range(1000):
    print('it should be a car: ',model.predict(x_car[i].reshape(1,-1)))

Output

x_train_our shape:  (10000, 32, 32, 3)
y_train_our shape:  (10000, 2)
x_train_our shape:  (10000, 3072)
y_train_our shape:  (10000, 2)
-------------testing size of different elements et toplogie: 
-------------x_train_our size:  (10000, 3072)
-------------y_train_our size:  (10000, 2)
-------------X_train_our size:  (7000, 3072)
-------------X_valid_our size:  (3000, 3072)
-------------Y_train_our size:  (7000, 2)
-------------Y_valid_our size:  (3000, 2)
Train on 7000 samples, validate on 3000 samples
Epoch 1/20
7000/7000 [==============================] - 52s 7ms/step - loss: 0.7114 - acc: 0.4907 - val_loss: 0.7237 - val_acc: 0.4877
Epoch 2/20
7000/7000 [==============================] - 51s 7ms/step - loss: 0.7004 - acc: 0.4967 - val_loss: 0.7065 - val_acc: 0.4877
Epoch 3/20
7000/7000 [==============================] - 51s 7ms/step - loss: 0.6979 - acc: 0.4981 - val_loss: 0.6977 - val_acc: 0.4877
Epoch 4/20
7000/7000 [==============================] - 52s 7ms/step - loss: 0.6990 - acc: 0.4959 - val_loss: 0.6970 - val_acc: 0.4877
Epoch 5/20
7000/7000 [==============================] - 53s 8ms/step - loss: 0.6985 - acc: 0.5030 - val_loss: 0.6929 - val_acc: 0.5123
Epoch 6/20
7000/7000 [==============================] - 52s 7ms/step - loss: 0.6970 - acc: 0.5036 - val_loss: 0.7254 - val_acc: 0.4877
Epoch 7/20
7000/7000 [==============================] - 51s 7ms/step - loss: 0.6968 - acc: 0.5047 - val_loss: 0.6935 - val_acc: 0.5123
Epoch 8/20
7000/7000 [==============================] - 47s 7ms/step - loss: 0.6970 - acc: 0.5076 - val_loss: 0.6941 - val_acc: 0.5123
Epoch 9/20
7000/7000 [==============================] - 50s 7ms/step - loss: 0.6982 - acc: 0.5024 - val_loss: 0.6928 - val_acc: 0.5123
Epoch 10/20
7000/7000 [==============================] - 47s 7ms/step - loss: 0.6974 - acc: 0.5010 - val_loss: 0.7222 - val_acc: 0.4877
Epoch 11/20
7000/7000 [==============================] - 51s 7ms/step - loss: 0.6975 - acc: 0.5087 - val_loss: 0.6936 - val_acc: 0.4877
Epoch 12/20
7000/7000 [==============================] - 49s 7ms/step - loss: 0.6991 - acc: 0.5021 - val_loss: 0.6938 - val_acc: 0.4877
Epoch 13/20
7000/7000 [==============================] - 49s 7ms/step - loss: 0.6976 - acc: 0.4996 - val_loss: 0.6983 - val_acc: 0.4877
Epoch 14/20
7000/7000 [==============================] - 49s 7ms/step - loss: 0.6978 - acc: 0.5064 - val_loss: 0.6944 - val_acc: 0.5123
Epoch 15/20
7000/7000 [==============================] - 49s 7ms/step - loss: 0.6993 - acc: 0.5019 - val_loss: 0.6937 - val_acc: 0.5123
Epoch 16/20
7000/7000 [==============================] - 49s 7ms/step - loss: 0.6969 - acc: 0.5027 - val_loss: 0.6930 - val_acc: 0.5123
Epoch 17/20
7000/7000 [==============================] - 49s 7ms/step - loss: 0.6981 - acc: 0.4939 - val_loss: 0.6953 - val_acc: 0.4877
Epoch 18/20
7000/7000 [==============================] - 51s 7ms/step - loss: 0.6969 - acc: 0.5030 - val_loss: 0.7020 - val_acc: 0.4877
Epoch 19/20
7000/7000 [==============================] - 51s 7ms/step - loss: 0.6984 - acc: 0.5039 - val_loss: 0.6973 - val_acc: 0.5123
Epoch 20/20
7000/7000 [==============================] - 51s 7ms/step - loss: 0.6981 - acc: 0.5053 - val_loss: 0.6940 - val_acc: 0.5123
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a plane:  [[0.5358367  0.46416324]]
it should be a car:  [[0.5358367  0.46416324]]
it should be a car:  [[0.5358367  0.46416324]]
it should be a car:  [[0.5358367  0.46416324]]
it should be a car:  [[0.5358367  0.46416324]]
it should be a car:  [[0.5358367  0.46416324]]
it should be a car:  [[0.5358367  0.46416324]]
it should be a car:  [[0.5358367  0.46416324]]
it should be a car:  [[0.5358367  0.46416324]]
it should be a car:  [[0.5358367  0.46416324]]
it should be a car:  [[0.5358367  0.46416324]]

Why the model is always predicting the same output whatever is the input value !!! In other words, the model says that I have cats all the time ( if i use 0.5 as a border decision) — Moni93
You have 50% of accuracy in the last epoch, that's why.... You should try more epochs, change learning rate, look at the loss function... Do a little bit more of research. — Eric
Have you checked (by plotting) that the images you are trying to predict are, in fact, different images? — Echows

ldavid ldavid · Accepted Answer · 2018-03-22T14:04:12

I can confirm this happens on Keras==2.1.5, tensorflow==1.6.0.

Short answer: this is an overfitting problem and I managed to solve it for the cifar10 dataset by lowering the learning rate to 0.0001 or changing the adam optimizer to SGD.

First, a few convenient modifications that do not vanish the problem:

Set batch_size=2048 to accelerate the epochs.
Set epochs=5 to accelerate the training.
Only display the first 10 test predictions.

My guess was that a network with 32*32*3*1000 parameters in the first layer alone was too easily overfitted with lr=0.001. So I changed cifar10 dataset by mnist, with input shape of 28*28 => first layer with 28*28*1000 units. That's the result:

Train on 8865 samples, validate on 3800 samples
Epoch 1/5
 - 1s - loss: 0.3029 - acc: 0.8831 - val_loss: 0.0339 - val_acc: 0.9958
...
Epoch 5/5
 - 1s - loss: 0.0063 - acc: 0.9982 - val_loss: 0.0039 - val_acc: 0.9987

it should be a plane:  [[0.9984061  0.00159395]]
it should be a plane:  [[0.99826896 0.00173102]]
it should be a plane:  [[0.9980952  0.00190475]]
it should be a plane:  [[0.9984674  0.00153262]]
it should be a plane:  [[0.99838233 0.00161765]]
it should be a plane:  [[0.9981931  0.00180687]]
it should be a plane:  [[0.9982863  0.00171365]]
it should be a plane:  [[0.9956332  0.00436677]]
it should be a plane:  [[0.9982967  0.00170333]]
it should be a plane:  [[0.9983923  0.00160768]]
it should be a car:  [[0.00104721 0.99895275]]
it should be a car:  [[0.00099913 0.99900085]]
it should be a car:  [[9.910525e-04 9.990089e-01]]
it should be a car:  [[9.878672e-04 9.990121e-01]]
it should be a car:  [[0.00105713 0.9989429 ]]
it should be a car:  [[0.02821341 0.9717866 ]]
it should be a car:  [[9.509333e-04 9.990490e-01]]
it should be a car:  [[0.00103957 0.9989604 ]]
it should be a car:  [[8.8129757e-04 9.9911875e-01]]
it should be a car:  [[0.00189029 0.9981097 ]]

So now I made too more executions: adam-0.0001 (with lr=0.0001) and sgd, using the SGD optimizer. The image bellow illustrates how predictions from these two keep spread throughout the epochs, in opposite to your implementation using adam(lr=0.001):

The image bellow shows how the gradients decrease much faster for adam:

This probably got it stuck at a local minima where the network simulated a constant function with respect to the inputs.

A few other comments on your code:

The following code has no effect:

keras.layers.core.Dropout(0.3, noise_shape=None, seed=None)

You need to add it to the model:

from keras.layers import Dropout
model.add(Dropout(...))

No need to set input_dim in each layer. Just the first.

Tensorboard is available so you can catch this kind of problem:

from keras import callbacks
model.fit(X_train_our, Y_train_our,
          epochs=5,
          batch_size=2048,
          validation_data=(X_valid_our, Y_valid_our),
          callbacks=[
              callbacks.TensorBoard('./logs/adam', histogram_freq=1, batch_size=2048, write_grads=True)
          ])

As mentioned above, it's not a good idea to feed image (or any raw data) directly into Dense layers. Layers with fewer parameters (e.g. Conv2D, LocallyConnected2D) are a better fit.

Issue: Model Classification cats and dogs (keras)

2 Answers