Step by step guide to train a multilayer perceptron for the XOR case in Weka?

Question

I'm just getting started with Weka and having trouble with the first steps.

We've got our training set:

@relation PerceptronXOR
@attribute X1 numeric
@attribute X2 numeric
@attribute Output numeric
@data
1,1,-1
-1,1,1
1,-1,1
-1,-1,-1

First step I want to do is just train, and then classify a set using the Weka gui. What I've been doing so far:

Using Weka 3.7.0.

Start GUI.
Explorer.
Open file -> choose my arff file.
Classify tab.
Use training set radio button.
Choose-> functions>multilayer_perceptron
Click the 'multilayer perceptron' text at the top to open settings.
Set Hidden layers to '2'. (if gui is selected true,t his show that this is the correct network we want). Click ok.
click start.

outputs:

=== Run information ===

Scheme:       weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H 2 -R
Relation:     PerceptronXOR
Instances:    4
Attributes:   3
              X1
              X2
              Output
Test mode:    evaluate on training data

=== Classifier model (full training set) ===

Linear Node 0
    Inputs    Weights
    Threshold    0.21069691964232443
    Node 1    1.8781169869419072
    Node 2    -1.8403146612166397
Sigmoid Node 1
    Inputs    Weights
    Threshold    -3.7331156814378685
    Attrib X1    3.6380519730323164
    Attrib X2    -1.0420815868133226
Sigmoid Node 2
    Inputs    Weights
    Threshold    -3.64785119182632
    Attrib X1    3.603244645539393
    Attrib X2    0.9535137571446323
Class 
    Input
    Node 0


Time taken to build model: 0 seconds

=== Evaluation on training set ===
=== Summary ===

Correlation coefficient                  0.7047
Mean absolute error                      0.6073
Root mean squared error                  0.7468
Relative absolute error                 60.7288 %
Root relative squared error             74.6842 %
Total Number of Instances                4

It seems odd that 500 iterations at 0.3 doesn't get it the error, but 5000 @ 0.1 does, so lets go with that.

Now use the test data set:

@relation PerceptronXOR
@attribute X1 numeric
@attribute X2 numeric
@attribute Output numeric
@data
1,1,-1
-1,1,1
1,-1,1
-1,-1,-1
0.5,0.5,-1
-0.5,0.5,1
0.5,-0.5,1
-0.5,-0.5,-1

Radio button to 'Supplied test set'
Select my test set arff.
Click start.

=== Run information ===

Scheme:       weka.classifiers.functions.MultilayerPerceptron -L 0.1 -M 0.2 -N 5000 -V 0 -S 0 -E 20 -H 2 -R
Relation:     PerceptronXOR
Instances:    4
Attributes:   3
              X1
              X2
              Output
Test mode:    user supplied test set:  size unknown (reading incrementally)

=== Classifier model (full training set) ===

Linear Node 0
    Inputs    Weights
    Threshold    -1.2208619057226187
    Node 1    3.1172079341507497
    Node 2    -3.212484459911485
Sigmoid Node 1
    Inputs    Weights
    Threshold    1.091378074639599
    Attrib X1    1.8621040828953983
    Attrib X2    1.800744048145267
Sigmoid Node 2
    Inputs    Weights
    Threshold    -3.372580743113282
    Attrib X1    2.9207154176666386
    Attrib X2    2.576791630598144
Class 
    Input
    Node 0


Time taken to build model: 0.04 seconds

=== Evaluation on test set ===
=== Summary ===

Correlation coefficient                  0.8296
Mean absolute error                      0.3006
Root mean squared error                  0.6344
Relative absolute error                 30.0592 %
Root relative squared error             63.4377 %
Total Number of Instances                8

Why is unable to classify these correctly?

Is it just because it's reached a local minimum quickly on the training data, and doesn't 'know' that that doesn't fit all the cases?

Questions.

Why does 500 @ 0.3 not work? Seems odd for such a simple problem.
Why does it fail on the test set.
How do I pass in a set to classify?

1. If the learning rate is too high it won't converge. Even 0.1 is somewhat large. — Josh S.

user3392574 user3392574 · Accepted Answer · 2014-03-07T12:30:56

Using learning rate with 0.5 does the job with 500 iterations for the both examples. The learning rate is how much weight it gives for new examples. Apparently the problem is difficult and it is easy to get in local minima with the 2 hidden layers. If you use a low learning rate with a high iteration number the learning process will be more conservative and more likely to high a good minimum.

Step by step guide to train a multilayer perceptron for the XOR case in Weka?

1 Answers