I want to extract Audio Features using RBM (Restricted Boltzmann Machine). For this, I am giving the spectrogram (PCA whitened) as an input to the RBM.
For each audio file, The spectrogram is a matrix with no. of columns fixed but with different number of rows for each audio file. My question how can I train my RBM, or how can I extract the features from audio using RBM, given this spectrogram matrix. I read in a paper by Honglak Lee, paper title Unsupervised Feature Learning for Audio Classification using convolutional deep belief networks. http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2009_1171.pdf
"We then trained 300 first layer bases with a filter length of 6 and a max-pooling ratio of 3."
First, what is meant by bases here. (They have used Convolutional Deep Belief Networks, so I guess, bases do not mean weights here).
Second, what do they mean by using a filter length of 6? How can I do it? Any hint will be appreciated. (I am new to RBM)
2
votes
This isn't the appropriate forum for the type of question you're asking; a response is unlikely to be forthcoming. We welcome specific programming questions. You might be better off on [dsp.stackexchange.com].
– marko
@marko Thanks. I already posted it on the dsp forum, though no response yet.
– user35919
1 Answers
0
votes
I think what is confusing here is they add a convolutional layer to their deep belief network. The idea of the convolutional layer is they use kernels that are specific to a small region of the image, in their case a 6 element window. I'm not an expert in audio problems, but I believe bases refer to the different bands in the spectrograph.