0
votes

I did a PCA in Python on audio spectrograms and face the following problem: I have a matrix, where each row consists of flattened song features. After applying PCA it's clear to me, that the dimensions are reduced. BUT I can't find those dimensional data in the regular dataset.

import sys
import glob

from scipy.io.wavfile import read
from scipy import signal
from scipy.fftpack import fft
import numpy as np
import matplotlib.pyplot as plt
import pylab

# Read file to get samplerate and numpy array containing the signal 

files = glob.glob('../some/*.wav')

song_list = []

for wav in files:

    (fs, x) = read(wav)

    channels = [
        np.array(x[:, 0]),
        np.array(x[:, 1])
    ]

    # Combine channels to make a mono signal out of stereo
    channel =  np.mean(channels, axis=0)
    channel = channel[0:1024,]
    # Generate spectrogram 
    ## Freqs is the same with different songs, t differs slightly
    Pxx, freqs, t, plot = pylab.specgram(
        channel,
        NFFT=128, 
        Fs=44100, 
        detrend=pylab.detrend_none,
        window=pylab.window_hanning,
        noverlap=int(128 * 0.5))
    # Magnitude Spectrum to use
    Pxx = Pxx[0:2]
    X_flat = Pxx.flatten()
    song_list.append(X_flat)

song_matrix = np.vstack(song_list)

If I now apply PCA to the song_matrix...

import matplotlib
from matplotlib.mlab import PCA
from sklearn import decomposition


#test = matplotlib.mlab.PCA(song_matrix.T)

pca = decomposition.PCA(n_components=2)
song_matrix_pca = pca.fit_transform(song_matrix.T)


pca.components_ #These components should be most helpful to discriminate between the songs due to their high variance
pca.components_

...the final 2 components are the following: Final components - two dimensions from 15 wav-files The problem is, that I can't find those two vectors in the original dataset with all dimensions What am I doing wrong or am I misinterpreting the whole thing?

2

2 Answers

1
votes

PCA doesn't give you the vectors in your dataset. From Wikipedia : Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components.

1
votes

Say you have a column vector V containing ONE flattened spectrogram. PCA will find a matrix M whose columns are orthogonal vectors (think of them as being at right angles to every other column in M).

Multiplying M and T will give you a vector of "scores", which can be used to determine how much variance each column of M captures from the original data and each column of M captures progressively less variance in the data.

Multiplying matrix M' (the first 2 columns of M) by V will produce a 2x1 vector T' representing the "dimension-reduced spectrogram". You could reconstruct an approximation of V by multiplying T' by the inverse of M'. This would work if you had a matrix of spectrograms, too. Keeping only two principal components would produce an extremely lossy compression of your data.

But what if you want to add a new song to your dataset? Unless it is very much like the original song (meaning it introduces little variance to the original data set), there's no reason to think that the vectors of M will describe the new song well. For that matter, even multiplying all the elements of V by a constant would render M useless. PCA is quite data specific. Which is why it's not used in image/audio compression.

The good news? You can use a Discrete Cosine transform to compress your training data. Instead of lines, it finds cosines that form a descriptive basis, and doesn't suffer from the data specific limitation. DCT is used in jpeg, mp3 and other compression schemes.