audio features extraction using restricted boltzmann machine

Question

I want to extract Audio Features using RBM (Restricted Boltzmann Machine). For this, I am giving the spectrogram (PCA whitened) as an input to the RBM.
For each audio file, The spectrogram is a matrix with no. of columns fixed but with different number of rows for each audio file. My question how can I train my RBM, or how can I extract the features from audio using RBM, given this spectrogram matrix. I read in a paper by Honglak Lee, paper title Unsupervised Feature Learning for Audio Classification using convolutional deep belief networks. http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2009_1171.pdf
"We then trained 300 first layer bases with a filter length of 6 and a max-pooling ratio of 3."
First, what is meant by bases here. (They have used Convolutional Deep Belief Networks, so I guess, bases do not mean weights here).
Second, what do they mean by using a filter length of 6? How can I do it? Any hint will be appreciated. (I am new to RBM)

This isn't the appropriate forum for the type of question you're asking; a response is unlikely to be forthcoming. We welcome specific programming questions. You might be better off on [dsp.stackexchange.com]. — marko
@marko Thanks. I already posted it on the dsp forum, though no response yet. — user35919

aplassard aplassard · Accepted Answer · 2013-12-13T13:46:41

I think what is confusing here is they add a convolutional layer to their deep belief network. The idea of the convolutional layer is they use kernels that are specific to a small region of the image, in their case a 6 element window. I'm not an expert in audio problems, but I believe bases refer to the different bands in the spectrograph.

audio features extraction using restricted boltzmann machine

1 Answers