How to deal with mutli-label classification which has imbalanced results while training neural networks ? One of the solutions that I came across was to penalize the error for rare labeled classes. Here is what how i designed the network :
Number of classes: 100. Input layer, 1st hidden layer and 2nd layer(100) are fully-connected with drop-outs and ReLU. The output of the 2nd hidden layer is py_x.
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=py_x, labels=Y))
Where Y is a modified version of one-hot-encoding with values between 1 to 5 set for all the labels of a sample. The value would ~1 for the most frequent label and ~5 for rarest labels. The value are not discrete, i.e new value to be set of a label in the one-hot-encoding is based on the formula
= 1 + 4*(1-(percentage of label/100))
For example: <0, 0, 1, 0, 1, .... > would be converted to something like <0, 0, 1.034, 0, 3.667, ...> . NOTE : only the values of 1 in the original vectors are changed.
This way if the model incorrectly predicts a rare label its error would be high, for ex: 0.0001 - 5 = -4.9999, and this would back-propagate a heavier error as compared to a mislabeling of very frequent label.
Is this the right way to penalize ? Are there any better methods to deal with this problem ?