2
votes

I made a principal component analysis and took the 2 first principal components. I made a chart of my points based on the score of the 2 PC. I would like to add on this graph a 95% confidence region corresponding to the Hotelling's T^2 test in order to detect the points that are out of the ellipse (outliers) How is it possible in R? Do you have any example?

I would do something like this and detect the points out of the ellipse:

enter image description here

2

2 Answers

0
votes

We can plot the confidence ellipse for PCA with vegan or ggbiplot as below:

set.seed(1)
data <- matrix(rnorm(500), ncol=5) # some random data
data <- setNames(as.data.frame(rbind(data, matrix(runif(25, 5, 10), ncol=5))), LETTERS[1:5]) # add some outliers
class <- sample(c(0,3,6,8), 105, replace=TRUE) # 4 groups

library(vegan)
PC <- rda(data, scale=TRUE)
pca_scores <- scores(PC, choices=c(1,2))
plot(pca_scores$sites[,1], pca_scores$sites[,2],
     pch=class, col=class, xlim=c(-2,2), ylim=c(-2,2))
arrows(0,0,pca_scores$species[,1],pca_scores$species[,2],lwd=1,length=0.2)
ordiellipse(PC,class,conf=0.95)

enter image description here

library(ggbiplot)
PC <- prcomp(data, scale = TRUE)
ggbiplot(PC, obs.scale = 1, var.scale = 1, groups = as.factor(class), ellipse = TRUE, 
                                                    ellipse.prob = 0.95)

enter image description here

0
votes

The pcaMethods package has a function simpleEllipse(x, y, alpha, len) that will do this. Given two uncorrelated data vectors it will return an ellipse, where the axes are scaled based on the variance of each score, and the F statistic.