0
votes

I have two gaussian distribution samples, one guassian contains 10,000 samples and the other gaussian also contains 10,000 samples, I would like to train a feed-forward neural network with these samples but I dont know how many samples I have to take in order to get an optimal decision boundary. Here is the code but I dont know exactly the solution and the output are weirds.

x1 = -49:1:50;
x2 = -49:1:50;
[X1, X2] = meshgrid(x1, x2);
Gaussian1 = mvnpdf([X1(:) X2(:)], mean1, var1);// for class A
Gaussian2 = mvnpdf([X1(:) X2(:)], mean2, var2);// for Class B
net = feedforwardnet(10);
G1 = reshape(Gaussian1, 10000,1);
G2 = reshape(Gaussian2, 10000,1);
input = [G1, G2];
output = [0, 1];
net = train(net, input, output);

When I ran the code it give me weird results. If the code is not correct, can someone please suggest me so that I can get a decision boundary for these two distributions.

1
It is not clear to me what you really want. Can you be more specific describing your training set please?Hatim Khouzaimi
In simple words, I want to find the accuracy of this feed forward NN, I want to draw a decision boundary of this classifier for this gaussian distributions. :)ASAD
To draw the decision boundary, this post may be of some assistance: stackoverflow.com/questions/33502666/…rayryeng

1 Answers

0
votes

I'm pretty sure that the input must be the Gaussian distribution (and not the x coordinates). In fact the NN has to understand the relationship between the phenomenons themselves that you are interested (the Gaussian distributions) and the output labels, and not between the space in which are contained the phenomenons and the labels. Moreover, If you choose the x coordinates, the NN will try to understand some relationship between the latter and the output labels, but the x are something of potentially constant (i.e., the input data might be even all the same, because you can have very different Gaussian distribution in the same range of the x coordinates only varying the mean and the variance). Thus the NN will end up being confused, because the same input data might have more output labels (and you don't want that this thing happens!!!).
I hope I was helpful.

P.S.: for doubt's sake I have to tell you that the NN doesn't fit very well the data if you have a small training set. Moreover don't forget to validate your data model using the cross-validation technique (a good rule of thumb is to use a 20% of your training set for the cross-validation set and another 20% of the same training set for the test set and thus to use only the remaining 60% of your training set to train your model).