Activation functions in Neural Networks

Question

I have a few set of questions related to the usage of various activation functions used in neural networks? I would highly appreciate if someone could give good explanatory answers.

Why ReLU is used only on hidden layers specifically?
Why Sigmoid is a not used in Multi-class classification?
Why we do not use any activation function in regression problems having all negative values?
Why we use "average='micro','macro','average'" while calculating performance metric in multi_class classification?

As these are not specific programming questions, they would be better suited for DataScience Stackexchange. Where of course one question per question policy applies, you're still expected to show what research you did (esp. given how basic your questions appear) etc. — dedObed

user user · Accepted Answer · 2019-11-13T16:10:57

I'll answer to the best of my ability the 2 first questions:

Relu (=max(0,x)) is used to extract feature maps from data. This is why it is used in the hidden layers where we're learning what important characteristics or features the data holds that could make the model learn how to classify for example. In the FC layers, it's time to make a decision about the output, so we usually use sigmoid or softmax, which tend to give us numbers between 0 and 1 (probability) that can give an interpretable result.
Sigmoid gives a probability for each class. So, if you have 10 classes, you'll have 10 probabilities. And depending on the threshold used, your model would predict for example that the image corresponds to two classes when in multi-classification you want just one predicted class per image. That's why softmax is used in this context: It chooses the class with the maximum probability. So it'll predict just one class.

Activation functions in Neural Networks

1 Answers