I have a few set of questions related to the usage of various activation functions used in neural networks? I would highly appreciate if someone could give good explanatory answers.
- Why ReLU is used only on hidden layers specifically?
- Why Sigmoid is a not used in Multi-class classification?
- Why we do not use any activation function in regression problems having all negative values?
- Why we use "average='micro','macro','average'" while calculating performance metric in multi_class classification?