4
votes

I'm trying to implement this recurrent neural network (it's a Voice Activity Detector):

rnn

Note that those blue circles are individual neurons - they don't represent many neurons. It's a really small network. There are some extra details, like what the S's mean and the fact that some layers are quadratic but they don't matter for this question.

I implemented it using Microsoft's CNTK like this (not tested!):

# For the layers with diagonal connections.
QuadraticWithDiagonal(X, Xdim, Ydim)
{
    OldX = PastValue(Xdim, 1, X)
    OldY = PastValue(Ydim, 1, Y)

    Wqaa = LearnableParameter(Ydim, Xdim)
    Wqbb = LearnableParameter(Ydim, Xdim)
    Wqab = LearnableParameter(Ydim, Xdim)
    Wla = LearnableParameter(Ydim, Xdim)
    Wlb = LearnableParameter(Ydim, Xdim)
    Wlc = LearnableParameter(Ydim, Xdim)
    Wb = LearnableParameter(Ydim)

    XSquared = ElementTimes(X, X)
    OldXSquared = ElementTimes(OldX, OldX)
    CrossXSquared = ElementTimes(X, OldX)

    T1 = Times(Wqaa, XSquared)
    T2 = Times(Wqbb, OldXSquared)
    T3 = Times(Wqab, CrossXSquared)

    T4 = Times(Wla, X)
    T5 = Times(Wlb, OldX)
    T6 = Times(Wlc, OldY)

    Y = Plus(T1, T2, T3, T4, T5, T6, Wb)
}


# For the layers without diagonal connections.
QuadraticWithoutDiagonal(X, Xdim, Ydim)
{
    OldY = PastValue(Ydim, 1, Y)

    Wqaa = LearnableParameter(Ydim, Xdim)
    Wla = LearnableParameter(Ydim, Xdim)
    Wlc = LearnableParameter(Ydim, Xdim)
    Wb = LearnableParameter(Ydim)

    XSquared = ElementTimes(X, X)

    T1 = Times(Wqaa, XSquared)  
    T4 = Times(Wla, X)
    T6 = Times(Wlc, OldY)

    Y = Plus(T1, T4, T6, Wb)
}


# The actual network.

# 13x1 input PLP.
I = InputValue(13, 1, tag="feature")
# Hidden layers
H0 = QuadraticWithDiagonal(I, 13, 3)
H1 = QuadraticWithDiagonal(H0, 3, 3)
# 1x1 Pre-output
P = Tanh(QuadraticWithoutDiagonal(H1, 3, 1))
# 5x1 Delay taps
D = QuadraticWithoutDiagonal(P, 1, 5)
# 1x1 Output
O = Tanh(QuadraticWithoutDiagonal(D, 5, 1))

The PastValue() function gets the value of a layer from the previous time-step. This makes it really easy to implement unusual RNNs like this one.

Unfortunately, although CNTK's Network Description Language is pretty awesome, I find the fact that you can't script the data input, training and evaluation steps rather restrictive. So I'm looking into implementing the same network in Torch or Tensorflow.

Unfortunately, I've read the documentation for both and I have no clue how to implement the recurrent connections. Both libraries seem to equate RNNs with LSTM black boxes that you stack as if they were non-recurrent layers. There doesn't seem to be an equivalent to PastValue() and all the examples that don't just use a pre-made LSTM layer are completely opaque.

Can anyone show me how to implement a network like this in either Torch or Tensorflow (or both!)?

1
Hi @Timmmm, I was also reading this paper and am wondering if you have this RNN-based VAD solution successfully implemented and trained. If yes, do you plan to open-source your implementation? Or do you know if there is an open implementation anywhere? Thanks!ispin2121
No, I haven't and I don't know if any open source implementations. If I ever get it working I will open source it, but currently I am working on a hotword detector. I'm going to wait until @chrisbasoglu's scripting interface is out to try again.Timmmm
By the way I found the use of PLPs to be poorly justified. With my hotword detector I've had some success with letting the network learn the filters, so I'm going to try that here. Should be simpler too (but will require more training data or regularisation as there are more learnable parameters).Timmmm
Thanks for sharing the info. If you would like someone to collaborate on the project, I am happy to see if there is anything that I can contribute. My interest is more about having a complete+standalone C-implementation that targets embedded environment (or even building an equivalent yet low-power hardware solution). However, it is always good to have a working baseline even if it is in other language or relying on a framework.ispin2121

1 Answers

2
votes

I am a Microsoft employee. CNTK is currently being componentized and its key high-level building blocks will be offered in the form of a C++/Python library. The goal is increased flexibility in use of the toolkit by enabling extensibility, interoperability with external code, and composability. These library components compose and interoperate to form the core training and evaluation capabilities needed in deep learning workloads. The library also allows one or more of these components to be implemented externally, enabling composing external C++/Python code and libraries with CNTK's library components. Expect this towards the end of August.