How can I create spectograms from .wav files in python for audio classification problem

Question

I want to use spectrograms for audio files classification with CNN. The problem is that my audio files have different lengths (between 2 seconds and 17 seconds) and when I generate the spectograms. they all have the same size which means that the spectrum is widen for the shorter audio files. How can I generate the spectograms so that the signal is not altered?

I tried using the matplotlib.pyplot library for creating the spectrograms but all the images are 640 x 480.

This is the code I used

import matplotlib.pyplot as plt
from scipy.io import wavfile

samplingFrequency, signalData = wavfile.read('dia0_utt0.wav')

plt.title('Spectrogram')

plt.specgram(signalData,Fs=samplingFrequency,NFFT=512)

plt.xlabel('Time')

plt.ylabel('Frequency')

plt.savefig('fig11.png')

I don't know how to obtain spectrograms of variable dimensions based on their length, or to have them of same dimensions but filling the rest until the max length with no information. For example, if I have a 3 seconds file and the max length is 17 seconds then generate the spectrogram for 3 seconds and fill the rest of the spectrogram with no noise to make it of 17 seconds.

Sheldon Sheldon · Accepted Answer · 2019-03-23T10:28:13

You can use the matplotlib.pyplot.xlim and matplotlib.pyplot.ylim functions to set the limits of both your axes.

[EDITED] In the example below, I load a 3 seconds long wav file of the Cantina Band song downloaded from this website:

import matplotlib.pyplot as plt
from scipy.io import wavfile

samplingFrequency, signalData = wavfile.read('C:/Users/Sheldon/Desktop/WAVEEXAMPLE/CantinaBand3.wav')


plt.title('Spectrogram')    
Pxx, freqs, bins, im = plt.specgram(signalData,Fs=samplingFrequency,NFFT=512)
plt.xlabel('Time')
plt.ylabel('Frequency')
plt.xlim(left=0,right=17)

plt.savefig('C:/Users/Sheldon/Desktop/WAVEEXAMPLE/fig11.png')

This script yields the following image:

Had I not specified plt.xlim(left=0, right=17), the output figure would have ranged between 0 and 3 seconds:

How can I create spectograms from .wav files in python for audio classification problem

2 Answers