I'm trying to train my network on MNIST using a self-made CNN (C++).
It gives enough good results when I use a simple model, like: Convolution (2 feature maps, 5x5) (Tanh) -> MaxPool (2x2) -> Flatten -> Fully-Connected (64) (Tanh) -> Fully-Connected (10) (Sigmoid).
After 4 epochs, it behaves like here 1.
After 16 epochs, it gives ~6,5% error on a test dataset.
But in the case of 4 feature maps in Conv, the MSE value isn't improving, sometimes even increasing 2,5 times 2.
The online training mode is used, with help of Adam optimizer (alpha: 0.01, beta_1: 0.9, beta_2: 0.999, epsilon: 1.0e-8). It is calculated as:
double AdamOptimizer::calc(int t, double& m_t, double& v_t, double g_t)
{
m_t = this->beta_1 * m_t + (1.0 - this->beta_1) * g_t;
v_t = this->beta_2 * v_t + (1.0 - this->beta_2) * (g_t * g_t);
double m_t_aver = m_t / (1.0 - std::pow(this->beta_1, t + 1));
double v_t_aver = v_t / (1.0 - std::pow(this->beta_2, t + 1));
return -(this->alpha * m_t_aver) / (std::sqrt(v_t_aver) + this->epsilon);
}
So, can be this problem caused by lack of some additional learning techniques (dropout, batch-normalization), or wrongly set parameters? Or it is caused by some implementation issues?
P. S. I provide a github link if necessary.