Principal component anaylsis, what do obtained coefficients tell me?

Question

I am trying to apply principal component analysis, to reduce the dimensions of my data. 200x146 , 200 observations(samples) with 146 features(dimensions), each observation can belong to one of three classes. What I am trying to do is to visualize the data, to see how the class centroids move after adding new samples to my data. Since it’s impossible to plot such high dimensional data, I am looking for a dimension that would represent my data in almost separate classclusters.

I know that PCA, calculates the eigenvalues of the eigenvectors, while the eigenvalues represent the variance. The higher the variance the more the data is spread out and better to visualize. The eigenvector with the highest eigenvalue is the principal component, an axis orthogonal to this component is then found by the PCA. (Did I understand the basic idea of PCA correctly?)

However I don’t understand, what information I do get when I use the matlab function pca() I get the coefficient, but what do they tell me and how to I proceed afterwards? )';

    data=trndata;
[coeff,score]=pca(data(:,1:end-1));

newinputdata=coeff(:, 1:3)*score(:, 1:3
newinputdata=newinputdata';

class1i=find(data(:,end)==1);
class2i=find(data(:,end)==2);
class3i=find(data(:,end)==3);


class1=newinputdata(class1i,:);
class2=newinputdata(class2i,:);
class3=newinputdata(class3i,:);


x=1;
y=2;
figure;
plot(class1(:,x), class1(:,y),'ro')
hold on
plot(class2(:,x), class2(:,y),'go')
hold on
plot(class3(:,x), class3(:,y),'bo')

Although PCA is certainly a good tool for what you want to achieve, the fact that you have three classes and want to project the data to 2D sounds like you should really look into linear discriminant analysis too for dimensionality reduction. — MB-F

Waseem Anwar Waseem Anwar · Accepted Answer · 2017-04-04T10:24:08

MATLAB pca() function gives us the flexibility to extract many useful information as described here.

If we output one argument (coeff), it will return the loadings in form of 146x200 matrix. Now if we ask, for two outputs arguments .i.e.

[coeff,score]=pca(X)

We will get loadings as well as corresponding score values. Here we can reconstruct the input data by coeff*score.

Now for dimension reduction, you will select fist n elements of both output arguments and perform the approximated reconstruction as coeff(:, 1:n)*score(:, 1:n)'.

I hope it answers your query.

Principal component anaylsis, what do obtained coefficients tell me?

1 Answers