Classification Neural Network does not learn

Question

I am building a classification Neural Network in order to classify two different classes.

So it is a binary classification problem, I am trying to solve this task by using a feedforward neural network.

But the network is not able to learn, in fact, the accuracy never changes during the training.

I will explain everything I did:

In particular, the dataset is composed by:

65673 rows and 22 columns.

One of these columns is the target class with values (0,1), while the other 21 are the predictors. The dataset is balanced in this way:

44% of the data (almost 29470 rows belong to class 1)
56% of the data (almost 36203 rows belong to class 0)

All the data are in a range between (0,1) since they were normalized.

For instance, just displaying the head of the dataset:

As it is possible to saw there are also NaN values, but I can not delete it since there is the value 0 inside other columns that is meaningful.

Taking a look also at the mean, std deviation, min, max of each columns:

I decided to perform a correlation analysis of my data and I obtained:

Since the goal is to classify (or predict) the target value, as it is shown in the correlation matrix, the columns [s, t, u, v, z] seems to not be correlated w.r.t the target column. Also, the columns:

[o, m] are 0.99 correlated
[q, r] are 0.95 correlated

So I also removed the column o and column q.

And I obtained this situation:

After that, I divided the dataset in order to take the target column and predictors column:

X= dataset.iloc[:,1:dataset.shape[1]]
y= dataset.iloc[:,0]

And created and fitted the model:

from keras.optimizers import Adam
from keras.layers import ReLU

model = Sequential()

model.add(Dense(X.shape[1], kernel_initializer='random_uniform',input_shape=(X.shape[1],)))
model.add(ReLU())
model.add(Dropout(0.1))
model.add(Dense(8))
model.add(ReLU())
model.add(Dropout(0.1))
model.add(Dense(4))
model.add(ReLU())
model.add(Dropout(0.1))
model.add(Dense(1, activation='sigmoid'))


opt = Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(loss="binary_crossentropy", optimizer = opt, metrics=["accuracy"])
model.fit(X,y, batch_size=64, epochs=100, validation_split=0.25)

The results I had obtained are always this:

Train on 49254 samples, validate on 16419 samples

Epoch 1/100 49254/49254 [==============================] - 5s 100us/step - loss: 0.6930 - acc: 0.5513 - val_loss: 0.6929 - val_acc: 0.5503

Epoch 2/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6927 - acc: 0.5516 - val_loss: 0.6926 - val_acc: 0.5503

Epoch 3/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6925 - acc: 0.5516 - val_loss: 0.6924 - val_acc: 0.5503

Epoch 4/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6922 - acc: 0.5516 - val_loss: 0.6921 - val_acc: 0.5503

Epoch 5/100 49254/49254 [==============================] - 2s 47us/step - loss: 0.6920 - acc: 0.5516 - val_loss: 0.6919 - val_acc: 0.5503

Epoch 6/100 49254/49254 [==============================] - 2s 47us/step - loss: 0.6917 - acc: 0.5516 - val_loss: 0.6917 - val_acc: 0.5503

Epoch 7/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6915 - acc: 0.5516 - val_loss: 0.6914 - val_acc: 0.5503

Epoch 8/100 49254/49254 [==============================] - 2s 49us/step - loss: 0.6913 - acc: 0.5516 - val_loss: 0.6912 - val_acc: 0.5503

Epoch 9/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6911 - acc: 0.5516 - val_loss: 0.6910 - val_acc: 0.5503

Epoch 10/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6909 - acc: 0.5516 - val_loss: 0.6908 - val_acc: 0.5503

. . .

Epoch 98/100 49254/49254 [==============================] - 2s 49us/step - loss: 0.6878 - acc: 0.5516 - val_loss: 0.6881 - val_acc: 0.5503

Epoch 99/100 49254/49254 [==============================] - 2s 49us/step - loss: 0.6878 - acc: 0.5516 - val_loss: 0.6881 - val_acc: 0.5503

Epoch 100/100 49254/49254 [==============================] - 2s 49us/step - loss: 0.6878 - acc: 0.5516 - val_loss: 0.6881 - val_acc: 0.5503

As you can see the accuracy always remain fixed, this is the only model in which I can saw some change in the loss function.

What I tried to do:

Use Sigmoid activation function in all the layer
Increase the number of node and number of hidden layer
Add l2 penalty in all the layers
Use different learning rate (from 0.01 to 0.000001)
Decrease or increase batch_size

But in all the case, the result was the same or even worse.

I also tried to use different optimizer since i was supposing that with this configuration It immediately reach a local minimum for the loss

I do not know what i can do to solve this problem, i was trying to understand if the problem is related to the weights of the network or the problem are the data itself.

Since this dataset was builded by taking a sample of rows of different day of data maybe It Is better use RNN?

Also about the normalization Is It right normalized them according min_max normalization?

Someone can help me to understand better this problem? Thank you very much.

There is really no obvious source of error here. Can you tell what is the % of NaN in each column of your data? — Shaunak Sen
Also, i would suggest try LeakyReLU as your activation and see if it helps. ReLU might be suffering from the vanishing gradient problem (adventuresinmachinelearning.com/…) — Shaunak Sen
Thank you @ShaunakSen the % of NaN in each column is less than 0.10%. I tried also using the LeakyReLU with different combination of alpha setting but the result is always the same. — traveller
Another problem that might be happening (although unlikely) is exploding gradient. There are 2 ways to handle this - by clipnorm or clipvalue. Keep the LeakyReLU config and set these values alternately to your optimizer and see if it makes a difference. Doc link: keras.io/optimizers Exploding grad problem: machinelearningmastery.com/… — Shaunak Sen
Thenk you again, but also by exploding the gradient problem there are no differences. I tried both clipnorm or clipvalue but no difference in the result — traveller

Nicolas Gervais Nicolas Gervais · Accepted Answer · 2020-02-25T11:46:13

Your neural network is ridiculously small, and that is paired with a ridiculously small learning rate.

At least, do the following:

Increase your learning rate to 0.001
Increase the number of neurons to 16, 32 (and why not add a third layer of 64)

You can also increase the dropout to 0.5, because 0.1 is not enough. That's not the source of your problems however. I'd certainly try to determine the optimal imputation strategy to deal with missing values.

Classification Neural Network does not learn

1 Answers