What is the calculation process of loss functions in multi-class multi-label classification problems using deep learning?

Question

Dataset description:

(1) X_train: (6000,4) shape

(2) y_train: (6000,4) shape

(3) X_validation: (2000,4) shape

(4) y_validation: (2000,4) shape

(5) X_test: (2000,4) shape

(6) y_test: (2000,4) shape

Relationship between X and Y is shown here

For single label classification, the activation function of the last layer is Softmax and the loss function is categorical_crossentrop. And I know the mathematical calculation method for the loss function.

And for multi-class multi-label classification problems, the activation function of the last layer is sigmoid, and the loss function is binary_crossentrop. I want to know how the mathematical calculation method of the loss function works

It would be a great help to me if you let me know.

def MinMaxScaler(data):
  numerator = data - np.min(data)
  denominator = np.max(data) - np.min(data)
  return numerator / (denominator + 1e-5)


kki = pd.read_csv(filename,names=['UE0','UE1','UE2','UE3','selected_UE0','selected_UE1','selected_UE2','selected_UE3'])

print(kki)

def LoadData(file):
  xy = np.loadtxt(file, delimiter=',', dtype=np.float32)
  print("Data set length:", len(xy))
  tr_set_size = int(len(xy) * 0.6)
  xy[:, 0:-number_of_UEs] = MinMaxScaler(xy[:, 0:-number_of_UEs]) #number_of_UES : 4

  X_train = xy[:tr_set_size, 0: -number_of_UEs] #6000 row
  y_train = xy[:tr_set_size, number_of_UEs:number_of_UEs*2]

  X_valid = xy[tr_set_size:int((tr_set_size/3) + tr_set_size), 0:-number_of_UEs]
  y_valid = xy[tr_set_size:int((tr_set_size/3) + tr_set_size), number_of_UEs:number_of_UEs *2]

  X_test = xy[int((tr_set_size/3) + tr_set_size):, 0:-number_of_UEs]
  y_test = xy[int((tr_set_size/3) + tr_set_size):, number_of_UEs:number_of_UEs*2]

  print("Training X shape:", X_train.shape)
  print("Training Y shape:", y_train.shape)
  print("validation x shape:", X_valid.shape)
  print("validation y shape:", y_valid.shape)
  print("Test X shape:", X_test.shape)
  print("Test Y shape:", y_test.shape)
  return X_train, y_train, X_valid, y_valid, X_test, y_test, tr_set_size


X_train, y_train, X_valid, y_valid, X_test, y_test, tr_set_size = LoadData(filename)

model = Sequential()
model.add(Dense(64,activation='relu', input_shape=(X_train.shape[1],)))
model.add(Dense(46, activation='relu'))
model.add(Dense(24, activation='relu')) 
model.add(Dense(12, activation='relu'))
model.add(Dense(4, activation= 'sigmoid'))

model.compile( loss ='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 

hist = model.fit(X_train, y_train, epochs=5, batch_size=1, verbose= 1, validation_data=(X_valid, y_valid), callbacks= es)

This is a learning process, and even if epochs are repeated, Accuracy does not improve.

Epoch 1/10

6000/6000 [==============================] - 14s 2ms/step - loss: 0.2999 - accuracy: 0.5345 - val_loss: 0.1691 - val_accuracy: 0.5465
Epoch 2/10

6000/6000 [==============================] - 14s 2ms/step - loss: 0.1554 - accuracy: 0.4883 - val_loss: 0.1228 - val_accuracy: 0.4710
Epoch 3/10

6000/6000 [==============================] - 14s 2ms/step - loss: 0.1259 - accuracy: 0.4710 - val_loss: 0.0893 - val_accuracy: 0.4910
Epoch 4/10

6000/6000 [==============================] - 13s 2ms/step - loss: 0.1094 - accuracy: 0.4990 - val_loss: 0.0918 - val_accuracy: 0.5540
Epoch 5/10

6000/6000 [==============================] - 13s 2ms/step - loss: 0.0967 - accuracy: 0.5223 - val_loss: 0.0671 - val_accuracy: 0.5405
Epoch 6/10

6000/6000 [==============================] - 13s 2ms/step - loss: 0.0910 - accuracy: 0.5198 - val_loss: 0.0836 - val_accuracy: 0.5380
Epoch 7/10

6000/6000 [==============================] - 13s 2ms/step - loss: 0.0870 - accuracy: 0.5348 - val_loss: 0.0853 - val_accuracy: 0.5775
Epoch 8/10

6000/6000 [==============================] - 13s 2ms/step - loss: 0.0859 - accuracy: 0.5518 - val_loss: 0.0515 - val_accuracy: 0.6520
Epoch 9/10

6000/6000 [==============================] - 13s 2ms/step - loss: 0.0792 - accuracy: 0.5508 - val_loss: 0.0629 - val_accuracy: 0.4350
Epoch 10/10

6000/6000 [==============================] - 13s 2ms/step - loss: 0.0793 - accuracy: 0.5638 - val_loss: 0.0632 - val_accuracy: 0.6270

Adarsh Wase Adarsh Wase · Accepted Answer · 2021-08-23T05:58:38

Mistake 1 - The shape of y_train, y_validation and y_test should be (6000,), (2000,) and (2000,) respectively.

Mistake 2 - For multi-class classification, the loss should be categorical_crossentropy and activation should be a softmax. So, change these two lines, like this:

model.add(Dense(4, activation= 'softmax'))

model.compile(loss ='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Suggestion - Why are you splitting data by yourself? Use scikit-learn train_test_split. This code will give you proper splits:

from sklearn.model_selection import train_test_split

x, x_test, y, y_test = train_test_split(xtrain, labels, test_size=0.2, train_size=0.8)
x_train, x_validation, y_train, y_validation = train_test_split(x, y, test_size = 0.25, train_size =0.75)

What is the calculation process of loss functions in multi-class multi-label classification problems using deep learning?

1 Answers