1
votes

I'm trying to train my network on MNIST using a self-made CNN (C++).

It gives enough good results when I use a simple model, like: Convolution (2 feature maps, 5x5) (Tanh) -> MaxPool (2x2) -> Flatten -> Fully-Connected (64) (Tanh) -> Fully-Connected (10) (Sigmoid).

After 4 epochs, it behaves like here 1.
After 16 epochs, it gives ~6,5% error on a test dataset.

But in the case of 4 feature maps in Conv, the MSE value isn't improving, sometimes even increasing 2,5 times 2.

The online training mode is used, with help of Adam optimizer (alpha: 0.01, beta_1: 0.9, beta_2: 0.999, epsilon: 1.0e-8). It is calculated as:

double AdamOptimizer::calc(int t, double& m_t, double& v_t, double g_t)
{
    m_t = this->beta_1 * m_t + (1.0 - this->beta_1) * g_t;
    v_t = this->beta_2 * v_t + (1.0 - this->beta_2) * (g_t * g_t);

    double m_t_aver = m_t / (1.0 - std::pow(this->beta_1, t + 1));
    double v_t_aver = v_t / (1.0 - std::pow(this->beta_2, t + 1));

    return -(this->alpha * m_t_aver) / (std::sqrt(v_t_aver) + this->epsilon);
}  

So, can be this problem caused by lack of some additional learning techniques (dropout, batch-normalization), or wrongly set parameters? Or it is caused by some implementation issues?

P. S. I provide a github link if necessary.

1
I would suggest to share at least relevant snippet of your code - Semih Korkmaz
Added. But code is enough complicated. So i'd not be against to get any advices, in which direction should I find a problem... - mrhemen2015

1 Answers

0
votes

Try to decrease the learning rate.