10
votes

I tried to implement a feedforward neural network.

This is the structure: Input layer: 8 neurons, Hidden layer: 8 neurons and Output layer: 8 neurons.

The input data are vectors of 8 bits (1 bit for each neuron of the input layer). The outputs of the neural network are also vectors of 8 bits. So in total the dataset has 256 examples.

Example: if given x = [0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0]

the output must be y = [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0]

This is the implementation:

from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import random
from math import ceil

#Dimension of layers
dim = 8

#Generate dataset
X = []
for i in range(0,2**dim):
    n = [float(x) for x in bin(i)[2:]]
    X.append([0.]*(dim-len(n))+n)
y = X[:]
random.shuffle(y)
X = np.array(X)
y = np.array(y)

# create model
model = Sequential()
model.add(Dense(dim, input_dim=dim, init='normal', activation='sigmoid'))
model.add(Dense(dim, init='normal', activation='sigmoid'))
model.add(Dense(dim, init='normal', activation='sigmoid'))

# Compile model
model.compile(loss='mse', optimizer='SGD', metrics=['accuracy'])
# Fit the model
model.fit(X, y, nb_epoch=1000, batch_size=50, verbose=0)
# evaluate the model
scores = model.evaluate(X, y)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
output = model.predict(X)

#Make the output binary
for i in range(0, output[:,0].size):
    for j in range(0, output[0].size):
        if output[i][j] > 0.5 or output[i][j] == 0.5:
            output[i][j] = 1
        else:
            output[i][j] = 0
print(output)

This is what I get in output:

acc: 50.39%
[[ 1.  0.  0. ...,  0.  1.  1.]
[ 1.  0.  0. ...,  0.  1.  1.]
[ 1.  0.  0. ...,  0.  1.  1.]
..., 
[ 1.  0.  0. ...,  0.  1.  1.]
[ 1.  0.  0. ...,  0.  1.  1.]
[ 1.  0.  0. ...,  0.  1.  1.]]

It seems that all outputs have the same value. So I don´t know what's wrong about the configuration. I tried this Cannot train a neural network in keras - stackoverflow which suggests to remove the activation function at the output layer but when I run this I get all output vectors with this value:

[ 0. 1. 1. ..., 1. 1. 1.]

Any insights on how to make it work?

3
How many times did you tried to rerun this? Maybe using different optimizer or regularization / randomization algorithm might help. It seems that your network is likely to stuck in local minimas.Marcin Możejko
I tried to rerun it like 15 times and got the same result. I tried using "Adam" and tried using "relu" activation and it imporved a little bit, now I get different outputs but accuracy still very low (4 out of 256 correct output).Chack Rodríguez
Have you tried to use e.g. dropout? Or batch normalization?Marcin Możejko
Interesting choice of data. You're objective is to learn the shuffling function?Mikael Rousson
Sorry for the delay. Yeah I tried dropout, but doesn't seem to work,Chack Rodríguez

3 Answers

12
votes

I had the very same problem.

I would suggest you to reduce the learning rate for SGD. In my case I had used Adam Optimizer with lr=0.001, but changing to 0.0001 solved the problem.

Default parameters for SGD are:

keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False)

7
votes

The output is relatively similar to multi-label classification so I would recommend:

  1. Change loss function to binary_crossentropy
  2. Retain the last activation layer as sigmoid and change the others - relu can be a good choice.
  3. Add validation to your "fit" call and increase verbosity - This will allow you to understand how your network changes through the epochs and especially when it over/under fits
  4. Add depth to the network until you overfit
  5. Add regularization to your network until you don't overfit
  6. repeat 4+5
1
votes

If you tried all the above and it does not work it means that you try to fit noise, there is no connection/correlation/relevance between your inputs and outputs.