1
votes

I'm using PCA to generate the Biplot from my dataset here with the tool S-Plus

The script to run my data is:

a= princomp(x =  ~ ., data = Week.2.Mon.portsweep,scores=T,cor =F)
a$loadings
a$scores
biplot(a,scale=F)

The biplot result is here.

With my knowledge, I interpret the biplot as following:

  1. Left & bottom axes: Scores of PC1 & PC2

  2. Right & top axes: Loadings values of PC1 & PC2

  3. The observations are in black and plotted based on the scores of PCs

  4. The arrow vectors indicate which variables account for most of PCs.

  5. The position of arrow name is based on combination of loading values of PC1&PC2

  6. Arrow length - ???

However, I dont know the what the length of arrow is based on. I read some references that the length of arrow is proportion of variance. Is that true? How can we calculate it based on the biplot graph?

Could you guys help me ? Thanks

1

1 Answers

0
votes

I looked inside the stats:::biplot.princomp and stats:::biplot.default.
The lengths of arrows are calculated as follows.

(1) The scale = F option of biplot is specified.

# Data generating process
set.seed(12345)
library(MASS)
n <- 100
mu <- c(1,-2,-0.5)
Sigma <- diag(rep(1,3))
X <- mvrnorm(n, mu=mu, Sigma=Sigma)

pca <- princomp(X, cor=T, scores=T)
biplot(pca, choices = 1:2, scale = F)

# Calculates arrow lengths
lam <- 1
len <- t(t(pca$loadings[, 1:2]) * lam)*0.8

# Plot arrows in green and see if overlap the red ones
mapply(function(x,y) arrows(0, 0, x, y, col = "green", 
            length = .1), x=len[,1], y=len[,2])

enter image description here

(2) The scale = 0.5 option of biplot is specified.

scale <- 0.5
biplot(pca, choices = 1:2, scale = scale)

lam <- (pca$sdev[1:2]*sqrt(pca$n.obs))^scale
len <- t(t(pca$loadings[, 1:2]) * lam)*0.8

mapply(function(x,y) arrows(0, 0, x, y, col = "green", length = .1),
       len[,1], len[,2])

enter image description here