5
votes

I'm looking for some advice on Dynamic Time Warping (DTW).

I have a Python script and extract Mel-Frequency Cepstral Coefficient (MFCC) feature vectors from .WAV files of various lengths. The feature vectors are arrays of varying lengths that contain arrays of 12 MFCCs.

For example, one .WAV file may be represented by an array that contains 10 sets of 12 feature vectors whilst another .WAV file may be represented by one array that contains 20 sets of 12 feature vectors.

I intend to use DTW to compare the two arrays of arrays, but I'm unsure how. I understand the concept of DTW and would have no issue implementing it if the feature vectors contained within the array were single numbers, my confusion is due to the fact that they are arrays.

Tl;dr: How would one compare two arrays of arrays using DTW?

Edit: I have read this question with no avail.

Many thanks, Adam

2
This project's documentation could help you: github.com/talcs/simpledtwSomethingSomething

2 Answers

3
votes

There is a nice tutorial on DTW here

I have done this in a dozen papers, see zebra finch example here

A key thing to note. You probably want to compare just ONE feature vector to the corresponding feature vector. It is rare that it is useful to use all 12.

1
votes

There is a really nice example in here.

Using DTW package in python, you can calculate the DTW between two Mel-Frequency Cepstral Coefficient (MFCC) feature vectors.

Even though it appears like they have used audios with same length in the tutorial, it worked fine with variable length audios.

In the other words my arrays were (3183, 12) and (3130, 12) in shape and it worked just fine.

Even if I do not understand your purpose completely, I think its best to calculate DTW for each coefficient separately.

In that case, you can use this example.

If you are trying to do speech recognition, here is another example.