Multilayer Perceptron - Error plateau

Question

I'm trying to implement a multilayer perceptron with backpropagation with only one hidden layer on Matlab. The objective is to replicate a function with two I'm trying to implement a multilayer perceptron with backpropagation with only one hidden layer on Matlab. The objective is to replicate a function with two inputs and one output.

The problem I'm having is that the error starts decreasing with every epoch but it just reaches a plateau and doesn't seems to improve as seen in:

This is an image of all the errors during a single Epoch:

as you can see there are some extreme cases that are not being handled correctly

Im using:

Weights initialized from -1 to 1
Mean Square Error
Variable number of hidden neurons
Momentum
Randomized input order
no bias
tanh activation function for the hidden layer
identity as the activation function of the output layer
Inputs in range of -3 to 3
Min-Max normalization of inputs

I have tried changing the number of neurons on the hidden layers, tried to lower the learning rate to really small amounts and nothing seems to help.

Here is the Matlab code:

clc
clear
%%%%%%%     DEFINITIONS  %%%%%%%%
i=0;
S=0;
X=rand(1000,2)*6-3; %generate inputs between -3,+3
Xval=rand(200,2)*6-3; %validation inputs
Number_Neurons=360;
Wh=rand(Number_Neurons,2)*2-1; %hidden weights
Wo=rand(Number_Neurons,1)*2-1;  %output weights
Learn=.001;% learning factor
momentumWh=0; %momentums
momentumWo=0;
a=.01;%momentum factor
WoN=Wo; %new weight

fxy=@(x,y) (3.*(1-x).^2).*(exp(-x.^2-(y+1).^2))-10.*(x./5-x.^3-y.^5).*(exp(-x.^2-y.^2))-(exp(-(x+1).^2-y.^2))./3;   %function to be replicated

fh=@(x) tanh(x); %hidden layer activation function
dfh= @(x) 1-tanh(x).^2; %derivative

fo=@(x) x; %output layer activation function
dfo= @(x) 1; %derivative

%%GRAPH FUNCTION
%[Xg,Yg]=meshgrid(X(:,1),X(:,2));
% Y=fxy(Xg,Yg);
% surf(Xg,Yg,Y)
%%%%%%%%%
Yr=fxy(X(:,1),X(:,2)); %Y real
Yval=fxy(Xval(:,1),Xval(:,2)); %validation Y
Epoch=1;
Xn=(X+3)/6;%%%min max normalization
Xnval=(Xval+3)/6;
E=ones(1,length(Yr));% error
Eval=ones(1,length(Yval));%validation error
MSE=1;

%%%%%        ITERATION    %%%%%
while 1
    N=1;
    perm=randperm(length(X(:,:))); %%%permutate inputs
    Yrand=Yr(perm);    %permutate outputs
    Xrand=Xn(perm,:);
    while N<=length(Yr) %epoch    

        %%%%%%foward pass %%%%%
        S=Wh*Xrand(N,:)'; %input multiplied by hidden weights  
        Z=fh(S); %activation function of hidden layer
        Yin=Z.*Wo; %output of hidden layer multiplied by output weights
        Yins=sum(Yin); %sum all the inputs
        Yc=fo(Yins);% activation function of output layer, Predicted Y
        E(N)=Yrand(N)-Yc; %error

        %%%%%%%% back propagation %%%%%%%%%%%%%
        do=E(N).*dfo(Yins); %delta of output layer
        DWo=Learn*(do.*Z)+a*momentumWo; %Gradient of output layer
        WoN=Wo+DWo;%New output weight
        momentumWo=DWo; %store momentum
        dh=do.*Wo.*dfh(S); %delta of hidden layer
        DWh1=Learn.*dh.*Xrand(N,1); %Gradient of hidden layer
        DWh2=Learn.*dh.*Xrand(N,2);
        DWh=[DWh1 DWh2]+a*momentumWh;%Gradient of hidden layer        
        Wh=Wh+DWh;  %new hidden layer weights
        momentumWh=DWh; %store momentum
        Wo=WoN; %update output weight
        N=N+1; %next value
    end

    MSET(Epoch)=(sum(E.^2))/length(E);  %Mean Square Error Training
    N=1;    
    %%%%%% validation %%%%%%%
    while N<=length(Yval)
        S=Wh*Xnval(N,:)';    
        Z=fh(S);
        Yin=Z.*Wo;
        Yins=sum(Yin);
        Yc=fo(Yins);
        Eval(N)=Yc-Yval(N);
        N=N+1;    
    end

    MSE(Epoch)=(sum(Eval.^2))/length(Eval);   %Mean Square Error de validacion  
    if MSE(Epoch)<=1 %stop condition
        break
    end
    disp(MSET(Epoch))
    disp(MSE(Epoch))
    Epoch=Epoch+1; %next epoch
end

have you tried increasing the learning rate and seeing if the error drops? — Simon
i have tried error rates from .1 to .00001 the best rate for me was .001 with a momentum factor of .01 higher rates produce more oscilation on the MSE and actually makes the performance worse. — Oscar Garcia
When i tried to learn neural network with one hidden layer, it did not learn when i provided only data with false expected results at the beginning. When expected results was "mixed" (some false, some true, some false,...) it solved the problem. — Luke

Matthew Spencer Matthew Spencer · Accepted Answer · 2017-06-02T02:12:29

There are a number of factors that can come into play for the particular problem that you are trying to solve:

The Complexity of the Problem: Is the problem considered easy for a neural network to solve (If using a standard dataset, have you compared the results to other studies?)
The Inputs: Are the inputs strongly related to the output? Are there more inputs that you can add to the NN? Are they preprocessed correctly?
Local Minima vs Global Minima: Are you sure that the problem has stopped in a local minima (A place where the NN gets stuck in learning that stops the NN from reaching a more optimal solution)?
Outputs: Are the output samples skewed in some way? Is this a binary output kind of problem, and are there enough samples on both sides?
Activation Function: Is there another appropriate Activation Function for the problem?

Then there is the Hidden Layers, Neurons, Learning Rate, Momentum, Epochs etc. which you appear to have trialled.

Based on the chart, this is the kind of learning performance that would roughly be expected for a BPNN, however trial and error is sometimes required to optimise the result from there.

I would try to work on the above options (particularly pre-processing of data) and see if this helps in your case.

Multilayer Perceptron - Error plateau

1 Answers