So I am new to ML and trying to make a simple "library" so I can learn more about neural networks.
My question: According to my understanding I have to take the derivative of each layer according to their activation function so I can calculate their deltas and adjust their weights etc...
For ReLU, sigmoid, tanh, it's super simple to implement them in Java (which is the language I am using BTW)
But to go from output to the input I have to start from (obviously) the output which has an activation function of softmax.
So do I have to take the derivative of the output layer as well or does it just apply to every other layer?
If I do have to get the derivative how can I implement the derivative that in Java? Thanks.
I have read a lot of pages with the explanation of the derivative of the softmax algorithm but they were really complicated for me and as I said I just started to learn ML and I didn't wanted to use a library off the shelf so here I am.
This is the class I store my activation functions.
public class ActivationFunction {
public static double tanh(double val) {
return Math.tanh(val);
}
public static double sigmoid(double val) {
return 1 / 1 + Math.exp(-val);
}
public static double relu(double val) {
return Math.max(val, 0);
}
public static double leaky_relu(double val) {
double result = 0;
if (val > 0) result = val;
else result = val * 0.01;
return result;
}
public static double[] softmax(double[] array) {
double max = max(array);
for (int i = 0; i < array.length; i++) {
array[i] = array[i] - max;
}
double sum = 0;
double[] result = new double[array.length];
for (int i = 0; i < array.length; i++) {
sum += Math.exp(array[i]);
}
for (int i = 0; i < result.length; i++) {
result[i] = Math.exp(array[i]) / sum;
}
return result;
}
public static double dTanh(double x) {
double tan = Math.tanh(x);
return (1 / tan) - tan;
}
public static double dSigmoid(double x) {
return x * (1 - x);
}
public static double dRelu(double x) {
double result;
if (x > 0) result = 1;
else result = 0;
return result;
}
public static double dLeaky_Relu(double x) {
double result;
if (x > 0) result = 1;
else if (x < 0) result = 0.01;
else result = 0;
return result;
}
private static double max(double[] array) {
double result = Double.MIN_VALUE;
for (int i = 0; i < array.length; i++) {
if (array[i] > result) result = array[i];
}
return result;
}
}
I am expecting to get the answer for the question: Do I need the derivative of softmax or not? If so how can I implement it?