I am currently studying principal component analysis and playing around with the R prcomp
function. My code is as follows:
library(dplyr)
iris1 = mutate( iris,
Species = factor( Species),
logSepalLength = log10( Sepal.Length ),
logSepalWidth = log10( Sepal.Width ),
logPetalLength = log10( Petal.Length ),
logPetalWidth = log10( Petal.Width ),
) %>%
dplyr::select(Species, starts_with("log") )
iris1.PCA = prcomp( ~ logSepalLength +
logSepalLength +
logSepalWidth +
logPetalLength +
logPetalWidth,
data = iris1, scale. = FALSE )
summary(iris1.PCA)
The output of summary(iris1.PCA)
is as follows:
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 0.4979 0.06009 0.05874 0.02337
Proportion of Variance 0.9702 0.01413 0.01350 0.00214
Cumulative Proportion 0.9702 0.98436 0.99786 1.00000
I want to use ggplot to generate a nice scree plot that shows the the cumulative contribution to total variance for each principal component. I can do this calculation manually, starting from the covariance matrix, using something like cumsum(eigenvals)/iris1.cov.trace
. However, according to summary(iris1.PCA)
, the prcomp
output already calculates the cumulative proportion for us! So how do we utilise that part of the summary(iris1.PCA)
object and ggplot
to generate a nice scree plot? I know we can manually copy the output values, but I'm looking for a more automated solution (since hard-copying values is not good software engineering practice).
I found this example of a scree plot using ggplot
(although, it does not use cumulative contribution to total variance):
var_explained_df %>%
ggplot(aes(x=PC,y=var_explained, group=1))+
geom_point(size=4)+
geom_line()+
labs(title="Scree plot: PCA on scaled data")