You are basically trying to solve a regression problem. Apart from what you have done, there are few other things that you can try:
- Use ImageAugmentation technique to generate more data. Also, normalize the images.
- Make a deeper model with a few more convolution layers.
- Use a proper weights initializer maybe He-normal for the convolution layers.
- Use BatchNormalization between layers to make the mean and std of your filter values equal to 0 and 1 respectively.
- Use crossentropy loss as it helps in better calculation of your gradients. In MSE the gradients become very small over time although it seemed to be preferred for regression problems.
- Try to change the optimizer to Adam.
- In case, you have a few more classes in your dataset, and you have class imbalance problem, you can use Focal loss, a variant of crossentropy loss which penalizes the misclassified labels more than the correctly classified labels. Also, reducing the batch size and Upsampling should help.
- Use Bayesian Optimization techniques for hyperparameter tuning of your model.
A sample model code:
with open(os.path.join(DATA_DIR, 'mnist.pickle'), 'rb') as fr:
X_train, Y_train, X_val, Y_val = pickle.load(fr)
X_train = X_train.reshape(60000, 784)
X_val = X_val.reshape(10000, 784)
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_train /= 255
X_val /= 255
nb_classes = 10
Y_train = to_categorical(Y_train, nb_classes)
Y_val = to_categorical(Y_val, nb_classes)
return X_train, Y_train, X_val, Y_val
def build_model(input_shape, dropout=True):
model = Sequential()
model.add(Conv2D(32, (5,5), activation='relu', kernel_initializer='he_uniform', padding='valid', input_shape=input_shape))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2), strides=1, padding='valid'))
if dropout:
model.add(Dropout(0.2))
model.add(Conv2D(64, (3,3), activation='relu', kernel_initializer='he_uniform', padding='valid'))
model.add(Conv2D(128, (3,3), activation='relu', kernel_initializer='he_uniform', padding='valid'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2), strides=2, padding='valid'))
if dropout:
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(BatchNormalization())
model.add(Dense(classes, activation='softmax', kernel_initializer='he_uniform'))
# optimizer = SGD(lr=0.01, decay-1e-6, momentum=0.9)
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
return model
sigmoid
. When you usesoftmax
you add a spurious condition to your output - mainly - coordinates summing up to1
. – Marcin Możejkosoftmax
orsigmoid
? – Marcin Możejko