4
votes

I want to use spectrograms for audio files classification with CNN. The problem is that my audio files have different lengths (between 2 seconds and 17 seconds) and when I generate the spectograms. they all have the same size which means that the spectrum is widen for the shorter audio files. How can I generate the spectograms so that the signal is not altered?

I tried using the matplotlib.pyplot library for creating the spectrograms but all the images are 640 x 480.

This is the code I used

import matplotlib.pyplot as plt
from scipy.io import wavfile

samplingFrequency, signalData = wavfile.read('dia0_utt0.wav')

plt.title('Spectrogram')

plt.specgram(signalData,Fs=samplingFrequency,NFFT=512)

plt.xlabel('Time')

plt.ylabel('Frequency')

plt.savefig('fig11.png')

I don't know how to obtain spectrograms of variable dimensions based on their length, or to have them of same dimensions but filling the rest until the max length with no information. For example, if I have a 3 seconds file and the max length is 17 seconds then generate the spectrogram for 3 seconds and fill the rest of the spectrogram with no noise to make it of 17 seconds.

2

2 Answers

4
votes

You can use the matplotlib.pyplot.xlim and matplotlib.pyplot.ylim functions to set the limits of both your axes.

[EDITED] In the example below, I load a 3 seconds long wav file of the Cantina Band song downloaded from this website:

import matplotlib.pyplot as plt
from scipy.io import wavfile

samplingFrequency, signalData = wavfile.read('C:/Users/Sheldon/Desktop/WAVEEXAMPLE/CantinaBand3.wav')


plt.title('Spectrogram')    
Pxx, freqs, bins, im = plt.specgram(signalData,Fs=samplingFrequency,NFFT=512)
plt.xlabel('Time')
plt.ylabel('Frequency')
plt.xlim(left=0,right=17)

plt.savefig('C:/Users/Sheldon/Desktop/WAVEEXAMPLE/fig11.png')

This script yields the following image:

enter image description here

Had I not specified plt.xlim(left=0, right=17), the output figure would have ranged between 0 and 3 seconds:

enter image description here

2
votes

You can also use Python's Librosa. Here's the complete code according to your requirement :

import librosa
import matplotlib.pyplot as plt

sig, fs = librosa.load(filename, sr=44100) #you can specify sample rate as well 
save_path = filename[:-4]+'.png'
plt.figure(figsize=(6.40,4.80), dpi=1000) #this makes the image resolution as 640x480
plt.axis('off') # no axis
plt.axes([0., 0., 1., 1.], frameon=False, xticks=[], yticks=[])
S = librosa.feature.melspectrogram(y=y, sr=44100, n_fft=4096, hop_length=2205, n_mels=512) #you can update it as per your requirement
librosa.display.specshow(librosa.power_to_db(S, ref=np.max),  cmap='jet')
plt.xlim(left=0,right=17)
plt.savefig(save_path, dpi=100, bbox_inches=None, pad_inches=0)
plt.close()