1
votes

I was investigating the interpretation of a biplot and meaning of loadings/scores in PCA in this question: What are the principal components scores?

According to the author of the first answer the scores are:

      x       y
John  -44.6  33.2
Mike  -51.9   48.8
Kate  -21.1   44.35

According to the second answer regarding "The interpretation of the four axis in bipolar":

The left and bottom axes are showing [normalized] principal component scores; the top and right axes are showing the loadings.

So, theoretically after plotting the biplot from "What are principal components scores" I should get on the left and bottom axes the scores:

      x       y
John  -44.6  33.2
Mike  -51.9   48.8
Kate  -21.1   44.35

and on the right and top the loadings.

I entered the data he provided in R:

DF<-data.frame(Maths=c(80, 90, 95), Science=c(85, 85, 80), English=c(60, 70, 40), Music=c(55, 45, 50))
pca = prcomp(DF, scale = FALSE)
biplot(pca)

This is the plot I got: Firstly, the left and bottom axis represent the loadings of the principal components. The top and right axis represent the scores BUT they do not correspond to the scores the author from the post provided (3 aka Kate has positive scores on the plot but one negative on PC1 according to the Tony Breyal in the first answer to the question in the post).

If I am doing or understanding something wrong, where is my mistake?

enter image description here

1

1 Answers

3
votes

There are a few nuances you missed:

  1. biplot.princomp function

For some reason biplot.princomp scales the loading and score axes differently. So the scores you see are transformed. To get the actual values you can invoke the biplot function like this:

biplot(pca, scale=0)

see help(biplot.princomp) for more.

Now the values are actual scores. You can confirm this by comparing the plot with pca$x.

  1. Centering.

However the result is still not the same as per the answer you found in crossvalidated SO.

This is because Tony Breyal calculated the scores manually and he was using non-centered data for that. the prcomp function does centering by default and then uses centered data to get the scores.

So you can center the data first:

> scale(DF, scale=FALSE)
         Maths   Science    English Music
[1,] -8.333333  1.666667   3.333333     5
[2,]  1.666667  1.666667  13.333333    -5
[3,]  6.666667 -3.333333 -16.666667     0

And now use these numbers to get the scores as per the answer:

           x                                                   y
John 0.28*(-8.3) + -0.17*1.6    + -0.94*3     + 0.07*5     0.77*(-8.3) + -0.08*1.6    + 0.19*3     + -0.60*5 
Mike 0.28*1.6    + -0.17*1.6    + -0.94*13    + 0.07*(-5)  0.77*1.6    + -0.08*1.6    + 0.19*13    + -0.60*(-5)
Kate 0.28*6.6    + -0.17*(-3.3) + -0.94*(-16) + 0.07*0     0.77*6.6    + -0.08*(-3.3) + 0.19*(-16) + -0.60*0

After doing this you should get the same scores as plotted by biplot(pca, scale=0)

Hope this helps.