2
votes

I am working on a deep learning (CNN + AEs) approach on facial images.

I have

  • an input layer of 112*112*3 of facial images

  • 3 convolution + max pooling + ReLU

  • 2 layers of fully connected with 512 neurons with 50% dropout to avoid overfitting and last output layer with 10 neurons since I have 10 classes.

  • also used reduce mean of softmax cross entropy and also L2.

For training I divided my dataset to 3 groups of:

  1. 60% for training
  2. 20% for validation
  3. 20% for evaluation

The problem is after few epochs the validation error rate stay fixed value and never changes. I have used tensorflow to implement my project.

I hadn't such problem before with CNNs so I think it's first time. I have checked the code it's based on tensorflow documentation so I don't think if the problem is with the code. Maybe I need to change some parameters but I am not sure.

Any idea about common solutions for such problem?

Update: I changed the optimizer from momentum to Adam whith default learning rate. For now validation error changes but it's lower than mini batch error most of the time while both have same batch sizes.

I have tested the model with and without biases with 0.1 as initial values but no good fit yet.

Update I fixed the issue I will update with more details soon.

1
Is it predicting the same class every time? What is your learning rate? - chris
@chris_anderson I've just checked it yes same class every time and learning rate is 0.01 and it decreases gradually. - Mohammad Javidan Darugar
Does your validation rate decrease at the beginning and then stop in a certain epoch or it is the same from the first iteration ? - Feras
Just for couple of epochs changes then stay same till end of maximum epochs - Mohammad Javidan Darugar

1 Answers

1
votes

One common solution that I found helpful for this type of problem is using TensorBoard. You can add details visualize training performance information after each epoch for different points in the computational graph. Adding key metrics is worth it since you can see how training progresses after applying changes in the adaptive learning rate, batch size, neural network architecture, drop out / regularization, number of GPUs, etc.

Here is the link that I found helpful to add these details: https://www.tensorflow.org/how_tos/graph_viz/#runtime_statistics