I'm building a LSTM network from scratch, from my own understanding of how LSTM cells work.
There are no layers, so I'm trying to implement non-vectorized forms of the equations I see in the tutorials. I'm also using peepholes from the cell state.
So far, I understand that it looks like this: LSTM network
With that I've made these equations for each of the gates for forward pass:
i_t = sigmoid( i_w * (x_t + c_t) + i_b )
f_t = sigmoid( f_w * (x_t + c_t) + f_b )
cell_gate = tanh( c_w * x_t + c_b )
c_t = (f_t * c_t) + (i_t * cell_gate)
o_t = sigmoid( o_w * (x_t + c_t) + o_b )
h_t = o_t * tanh(c_t)
Where _w's mean weights for that respective gate and _b for biases. Also, I've named that first sigmoid on the far left the "cell_gate".
Back pass is where things get fuzzy for me, I'm not sure how to derive these equations correctly.
I know generally to calculate error, the equation is: error = f'(x_t) * (received_error). Where f'(x_t) is the first derivative of the activation function and received_error could be either (target - output) for output neurons or ∑(o_e * w_io) for hidden neurons.
Where o_e is the error of one of the cells the current cell outputs to and w_io is the weight connecting them.
I not sure if the LSTM cell as a whole is considered a neuron, so I treated each of the gates as neurons and tried to calculate error signals for each. Then used the error signal from the cell gate alone to pass back up the network...:
o_e = sigmoid'(o_w * (x_t + c_t) + o_b) * (received_error)
o_w += o_l * x_t * o_e
o_b += o_l * sigmoid(o_b) * o_e
...The rest of the gates follow the same format...
Then the error for the entire LSTM cell is equal to o_e.
Then for a LSTM cell above the current cell, the error it receives is equal to to:
tanh'(x_t) * ∑(o_e * w_io)
Is this all correct? Am I doing anything completely wrong?