I am trying to create a CNN that could detect the numbers in an image. For this I started working with The Street View House Numbers (SVHN) Dataset. This dataset comes with pre-processed images scaled to 32x32 digits.There are 10 classes for 10 numbers.
I trained the network and it gives a decent test accuracy of close to ~0.93. The test accuracy is also calculated on the test set which is a set of 32x32 digits.
This is all good. But the problem is the prediction probability is always one. Here is how the output of one of the class looks like:
array([[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
1.0000000e+00, 8.5623318e-24, 0.0000000e+00, 0.0000000e+00,
0.0000000e+00, 2.4716297e-28]], dtype=float32)
As could be seen from the output of one of the example, class probability for one of the class is 1
. This is fine for an image which contains the image of the desired class but probaility of 1 happens even when there no mark of number in an image. For example, the following image predicts the class 4
with probability of 1
. In fact the above distribution is for the following image.
Image:
I have not been able to identify the reason for this. I am sharing the code I used to create the CNN.
val_split_length = 10623
num_train_samples = 73257
num_test_samples = 26032
total_classes = 10
model_prefix = "10c"
model = keras.Sequential()
# First Conv. Layer
model.add(keras.layers.Conv2D(filters = 96, kernel_size = (11,11), strides = (4,4), padding = "same", input_shape=(227,227,3)))
model.add(keras.layers.Activation("relu"))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.MaxPooling2D(pool_size = (3,3), strides = (2,2), padding="same"))
# ##More Conv. Layers ###
# First Fully Connected Layer
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(4096))
model.add(keras.layers.Activation("relu"))
model.add(keras.layers.Dropout(0.5))
## More Fully Connected Layers ###
# Third Fully Connected Layer
model.add(keras.layers.Dense(total_classes))
model.add(keras.layers.Activation("softmax"))
train_optimizer_adam = tf.train.AdamOptimizer(learning_rate=1e-3)
train_optimizer_rmsProp = keras.optimizers.RMSprop(lr=0.0001)
#https://keras.io/optimizers/
model.compile(loss="categorical_crossentropy", optimizer=train_optimizer_rmsProp, metrics=['accuracy'])
batch_size = 128 * 3
data_generator = keras.preprocessing.image.ImageDataGenerator(rescale = 1./255)
# https://keras.io/preprocessing/image/#flow_from_directory
train_generator = data_generator.flow_from_directory(
'train',
target_size=(227, 227),
batch_size=batch_size,
color_mode='rgb',
class_mode='categorical',
#save_to_dir="logs"
)
validation_generator = data_generator.flow_from_directory(
'validation',
target_size=(227, 227),
batch_size=batch_size,
color_mode='rgb',
class_mode='categorical')
# https://keras.io/models/model/#fit_generator
history = model.fit_generator(
train_generator,
validation_data = validation_generator,
validation_steps = math.ceil(val_split_length / batch_size),
epochs = 5,
steps_per_epoch = math.ceil(num_train_samples / batch_size),
use_multiprocessing = True,
workers = 8,
callbacks = model_callbacks,
verbose = 2
)
To predict from the model above:
img = cv2.imread("image.png")
img = cv2.resize(img, (227,227))
loaded_model = keras.models.load_model("saved-model-12-0.96.hdf5")
prob = loaded_model.predict_proba(np.expand_dims(img, axis = 0))
print(prob)
What could be the reason that I get a high probability for class that does not exist anywhere in the image? I understand that the model will predict something, but why is the probability so high?