For binary classification, we could either go for a final linear layer with 1 output, and use a sigmoid with a threshold, or a final linear layer with 2 outputs, and use a softmax. Is there any advantage to one vs the other?
If you are doing a binary classification, i would suggest having 1 output node with sigmoid and if your problem is a multi class classification, i would suggest having as many nodes as number of labels with softmax.
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.OkRead more