I am building a classification Neural Network in order to classify two different classes.
So it is a binary classification problem, I am trying to solve this task by using a feedforward neural network.
But the network is not able to learn, in fact, the accuracy never changes during the training.
I will explain everything I did:
In particular, the dataset is composed by:
- 65673 rows and 22 columns.
One of these columns is the target class with values (0,1), while the other 21 are the predictors. The dataset is balanced in this way:
- 44% of the data (almost 29470 rows belong to class 1)
- 56% of the data (almost 36203 rows belong to class 0)
All the data are in a range between (0,1) since they were normalized.
For instance, just displaying the head of the dataset:
As it is possible to saw there are also NaN values, but I can not delete it since there is the value 0 inside other columns that is meaningful.
Taking a look also at the mean, std deviation, min, max of each columns:
I decided to perform a correlation analysis of my data and I obtained:
Since the goal is to classify (or predict) the target value, as it is shown in the correlation matrix, the columns [s, t, u, v, z] seems to not be correlated w.r.t the target column. Also, the columns:
- [o, m] are 0.99 correlated
- [q, r] are 0.95 correlated
So I also removed the column o and column q.
And I obtained this situation:
After that, I divided the dataset in order to take the target column and predictors column:
X= dataset.iloc[:,1:dataset.shape[1]]
y= dataset.iloc[:,0]
And created and fitted the model:
from keras.optimizers import Adam
from keras.layers import ReLU
model = Sequential()
model.add(Dense(X.shape[1], kernel_initializer='random_uniform',input_shape=(X.shape[1],)))
model.add(Dense(1, activation='sigmoid'))
opt = Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(loss="binary_crossentropy", optimizer = opt, metrics=["accuracy"])
model.fit(X,y, batch_size=64, epochs=100, validation_split=0.25)
The results I had obtained are always this:
Train on 49254 samples, validate on 16419 samples
Epoch 1/100 49254/49254 [==============================] - 5s 100us/step - loss: 0.6930 - acc: 0.5513 - val_loss: 0.6929 - val_acc: 0.5503
Epoch 2/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6927 - acc: 0.5516 - val_loss: 0.6926 - val_acc: 0.5503
Epoch 3/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6925 - acc: 0.5516 - val_loss: 0.6924 - val_acc: 0.5503
Epoch 4/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6922 - acc: 0.5516 - val_loss: 0.6921 - val_acc: 0.5503
Epoch 5/100 49254/49254 [==============================] - 2s 47us/step - loss: 0.6920 - acc: 0.5516 - val_loss: 0.6919 - val_acc: 0.5503
Epoch 6/100 49254/49254 [==============================] - 2s 47us/step - loss: 0.6917 - acc: 0.5516 - val_loss: 0.6917 - val_acc: 0.5503
Epoch 7/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6915 - acc: 0.5516 - val_loss: 0.6914 - val_acc: 0.5503
Epoch 8/100 49254/49254 [==============================] - 2s 49us/step - loss: 0.6913 - acc: 0.5516 - val_loss: 0.6912 - val_acc: 0.5503
Epoch 9/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6911 - acc: 0.5516 - val_loss: 0.6910 - val_acc: 0.5503
Epoch 10/100 49254/49254 [==============================] - 2s 48us/step - loss: 0.6909 - acc: 0.5516 - val_loss: 0.6908 - val_acc: 0.5503
. . .
Epoch 98/100 49254/49254 [==============================] - 2s 49us/step - loss: 0.6878 - acc: 0.5516 - val_loss: 0.6881 - val_acc: 0.5503
Epoch 99/100 49254/49254 [==============================] - 2s 49us/step - loss: 0.6878 - acc: 0.5516 - val_loss: 0.6881 - val_acc: 0.5503
Epoch 100/100 49254/49254 [==============================] - 2s 49us/step - loss: 0.6878 - acc: 0.5516 - val_loss: 0.6881 - val_acc: 0.5503
As you can see the accuracy always remain fixed, this is the only model in which I can saw some change in the loss function.
What I tried to do:
- Use Sigmoid activation function in all the layer
- Increase the number of node and number of hidden layer
- Add l2 penalty in all the layers
- Use different learning rate (from 0.01 to 0.000001)
- Decrease or increase batch_size
But in all the case, the result was the same or even worse.
I also tried to use different optimizer since i was supposing that with this configuration It immediately reach a local minimum for the loss
I do not know what i can do to solve this problem, i was trying to understand if the problem is related to the weights of the network or the problem are the data itself.
Since this dataset was builded by taking a sample of rows of different day of data maybe It Is better use RNN?
Also about the normalization Is It right normalized them according min_max normalization?
Someone can help me to understand better this problem? Thank you very much.