3
votes

It seems I have an issue in the implementation of a function to create a frequency spectrum from an audio file. I ask this question in the hope someone will find the problem.

You can download the 32bit float WAV audio file here.

I am working on a script which is creating a spectrum analysis from an audio file using SciPy and NumPy. Before I started, I analyzed the file using Sonic Visualizer, which got me the following result:

Sonic Visualizer Result

Now I tried to reproduce this result using my Python Script, but get a different result:

Script Result

Everything looks right, except the scale of the dB values. At 100Hz, Sonic Visualizer is at -40dB and my Script is at -65dB. So I assume, there is a problem in my script converting the FFT result to dBFS.

If I match the curve from Sonic Visualizer to my script's output, it is obvious the conversion of the levels lacks some factor:

Comparison

A minimal version of my script, using the 'demo.wav' file above, looks like this:

from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
from scipy.io import wavfile as wavfile
from scipy.signal import savgol_filter

def db_fft(data, sample_rate):
    data_length = len(data)
    weighting = np.hanning(data_length)
    data = data * weighting
    values = np.fft.rfft(data)
    frequencies = np.fft.rfftfreq(data_length, d=1. / sample_rate)
    s_mag = np.abs(values) * 2 / np.sum(weighting)
    s_dbfs = 20 * np.log10(s_mag)
    return frequencies, s_dbfs

audio_file = Path('demo.wav')
frequency, data = wavfile.read(str(audio_file))
data = data[0:4096]
x_labels, s_dbfs = db_fft(data, frequency)
flat_data = savgol_filter(s_dbfs, 601, 3)
plt.style.use('seaborn-whitegrid')
plt.figure(dpi=150, figsize=(16, 9))
plt.semilogx(x_labels, s_dbfs, alpha=0.4, color='tab:blue', label='Spectrum')
plt.semilogx(x_labels, flat_data, color='tab:blue', label='Spectrum (with filter)')
plt.grid(True)
plt.title(audio_file.name)
plt.ylim([-160, 0])
plt.xlim([10, 10000])
plt.xlabel('Frequency [Hz]')
plt.ylabel('Amplitude [dB]')
plt.grid(True, which="both")
target_name = audio_file.parent / (audio_file.stem + '.png')
plt.savefig(str(target_name))

The script converts the 32bit float audio file into a dBFS spectrum diagram, using the first 4096 samples as the window, as Sonic Visualizer does.

Where is the problem with my script, why do I get a different result?

2

2 Answers

4
votes

1. Different decibels

The first big difference is that they are using the "power ratio" definition of the decibel, from this Wikipedia page:

When expressing a power ratio, the number of decibels is ten times its logarithm to base 10.

I have also verified this in the v4.0.1 source code (in svcore/base/AudioLevel.cpp, line 54)

double dB = 10 * log10(multiplier);

2. Different magnitude calculation

They appear simply to divide by the size of the window in the code when calculating the magnitude. This leads to a change of calculation to

s_mag = np.abs(values) * 2  / data_length 

3. "Corrected" result

I have not found a way to export their spectrum, but I have manually read off the first few values (note, not the dB value) as

theirvalues = [
    0.00074, 
    0.000745865, 
    0.00119605, 
    0.0013713, 
    0.0011812, 
    0.000746891, 
    0.000334177,
    0.000163241,
    7.57671e-5,
    3.17983e-5,
    2.91934e-5,
    3.74938e-5
]

with the two changes I have mentioned, the graphs compare as follows:

Comparison graph

It's still not an exact match, but it's much closer. I suspect there may still be some smoothing of some kind (there are references to hops in the code, but I can't quite suss out what they're doing).

0
votes

As you noted, your two results differ by a constant factor that is approximately 2.

From Wikipedia's entry on Decibel (my emphasis):

Two different scales are used when expressing a ratio in decibels, depending on the nature of the quantities: power and field (root-power). When expressing a power ratio, the number of decibels is ten times its logarithm to base 10.[2] That is, a change in power by a factor of 10 corresponds to a 10 dB change in level. When expressing field (root-power) quantities, a change in amplitude by a factor of 10 corresponds to a 20 dB change in level. The decibel scales differ by a factor of two so that the related power and field levels change by the same number of decibels with linear loads.

You are using a factor of 10.

s_dbfs = 20 * np.log10(s_mag)

If you change the scalar to 20, you get this image:

enter image description here

This may or may not explain your scale difference. The source code of sonic visualizer is on sourceforge, so it should be easy to check (sourceforge is not allowing me to set my tracking policy, so I am not going there myself).