I'm trying to implement a multilayer perceptron with backpropagation with only one hidden layer on Matlab. The objective is to replicate a function with two I'm trying to implement a multilayer perceptron with backpropagation with only one hidden layer on Matlab. The objective is to replicate a function with two inputs and one output.
The problem I'm having is that the error starts decreasing with every epoch but it just reaches a plateau and doesn't seems to improve as seen in:
This is an image of all the errors during a single Epoch:
as you can see there are some extreme cases that are not being handled correctly
Im using:
- Weights initialized from -1 to 1
- Mean Square Error
- Variable number of hidden neurons
- Momentum
- Randomized input order
- no bias
- tanh activation function for the hidden layer
- identity as the activation function of the output layer
- Inputs in range of -3 to 3
- Min-Max normalization of inputs
I have tried changing the number of neurons on the hidden layers, tried to lower the learning rate to really small amounts and nothing seems to help.
Here is the Matlab code:
clc
clear
%%%%%%% DEFINITIONS %%%%%%%%
i=0;
S=0;
X=rand(1000,2)*6-3; %generate inputs between -3,+3
Xval=rand(200,2)*6-3; %validation inputs
Number_Neurons=360;
Wh=rand(Number_Neurons,2)*2-1; %hidden weights
Wo=rand(Number_Neurons,1)*2-1; %output weights
Learn=.001;% learning factor
momentumWh=0; %momentums
momentumWo=0;
a=.01;%momentum factor
WoN=Wo; %new weight
fxy=@(x,y) (3.*(1-x).^2).*(exp(-x.^2-(y+1).^2))-10.*(x./5-x.^3-y.^5).*(exp(-x.^2-y.^2))-(exp(-(x+1).^2-y.^2))./3; %function to be replicated
fh=@(x) tanh(x); %hidden layer activation function
dfh= @(x) 1-tanh(x).^2; %derivative
fo=@(x) x; %output layer activation function
dfo= @(x) 1; %derivative
%%GRAPH FUNCTION
%[Xg,Yg]=meshgrid(X(:,1),X(:,2));
% Y=fxy(Xg,Yg);
% surf(Xg,Yg,Y)
%%%%%%%%%
Yr=fxy(X(:,1),X(:,2)); %Y real
Yval=fxy(Xval(:,1),Xval(:,2)); %validation Y
Epoch=1;
Xn=(X+3)/6;%%%min max normalization
Xnval=(Xval+3)/6;
E=ones(1,length(Yr));% error
Eval=ones(1,length(Yval));%validation error
MSE=1;
%%%%% ITERATION %%%%%
while 1
N=1;
perm=randperm(length(X(:,:))); %%%permutate inputs
Yrand=Yr(perm); %permutate outputs
Xrand=Xn(perm,:);
while N<=length(Yr) %epoch
%%%%%%foward pass %%%%%
S=Wh*Xrand(N,:)'; %input multiplied by hidden weights
Z=fh(S); %activation function of hidden layer
Yin=Z.*Wo; %output of hidden layer multiplied by output weights
Yins=sum(Yin); %sum all the inputs
Yc=fo(Yins);% activation function of output layer, Predicted Y
E(N)=Yrand(N)-Yc; %error
%%%%%%%% back propagation %%%%%%%%%%%%%
do=E(N).*dfo(Yins); %delta of output layer
DWo=Learn*(do.*Z)+a*momentumWo; %Gradient of output layer
WoN=Wo+DWo;%New output weight
momentumWo=DWo; %store momentum
dh=do.*Wo.*dfh(S); %delta of hidden layer
DWh1=Learn.*dh.*Xrand(N,1); %Gradient of hidden layer
DWh2=Learn.*dh.*Xrand(N,2);
DWh=[DWh1 DWh2]+a*momentumWh;%Gradient of hidden layer
Wh=Wh+DWh; %new hidden layer weights
momentumWh=DWh; %store momentum
Wo=WoN; %update output weight
N=N+1; %next value
end
MSET(Epoch)=(sum(E.^2))/length(E); %Mean Square Error Training
N=1;
%%%%%% validation %%%%%%%
while N<=length(Yval)
S=Wh*Xnval(N,:)';
Z=fh(S);
Yin=Z.*Wo;
Yins=sum(Yin);
Yc=fo(Yins);
Eval(N)=Yc-Yval(N);
N=N+1;
end
MSE(Epoch)=(sum(Eval.^2))/length(Eval); %Mean Square Error de validacion
if MSE(Epoch)<=1 %stop condition
break
end
disp(MSET(Epoch))
disp(MSE(Epoch))
Epoch=Epoch+1; %next epoch
end