6
votes

I have been struggling with rescaling the loadings (arrows) length in a ggplot2/ggfortify PCA. I have looked around extensively for an answer to this, and the only information I have found either code new biplot functions or refer to other entirely different packages for PCA (ggbiplot, factoextra), neither of which address the question I would like to answer:

Is it possible to scale/change size of PCA loadings in ggfortify?

Below is the code I have to plot a PCA using stock R functions as well as the code to plot a PCA using autoplot/ggfortify. You'll notice in the stock R plots I can scale the loads by simply multiplying by a scalar (*20 here) so my arrows aren't cramped in the middle of the PCA plot. Using autoplot...not so much. What am I missing? I'll move to another package if necessary but would really like to have a better understanding of ggfortify.

On other sites I have found, the graph axes limits never seem to exceed +/- 2. My graph goes +/- 20, and the loadings sit staunchly near 0, presumably at the same scale as graphs with smaller axes. I would still like to plot PCA using ggplot2, but if ggfortify won't do it then I need to find another package that will.

#load data geology rocks frame
georoc <- read.csv("http://people.ucsc.edu/~mclapham/earth125/data/georoc.csv")

#load libraries
library(ggplot2)
library(ggfortify)

geo.na <- na.omit(georoc) #remove NA values
geo_matrix <- as.matrix(geo.na[,3:29]) #create matrix of continuous data in data frame
pca.res <- prcomp(geo_matrix, scale = T) #perform PCA using correlation matrix (scale = T)
summary(pca.res) #return summary of PCA

#plotting in stock R
plot(pca.res$x, col = c("salmon","olivedrab","cadetblue3","purple")[geo.na$rock.type], pch = 16, cex = 0.2)
#make legend
legend("topleft", c("Andesite","Basalt","Dacite","Rhyolite"), 
       col = c("salmon","olivedrab","cadetblue3","purple"), pch = 16, bty = "n")
#add loadings and text
arrows(0, 0, pca.res$rotation[,1]*20, pca.res$rotation[,2]*20, length = 0.1)
text(pca.res$rotation[,1]*22, pca.res$rotation[,2]*22, rownames(pca.res$rotation), cex = 0.7)

#plotting PCA
autoplot(pca.res, data = geo.na, colour = "rock.type", #plot results, name using original data frame
         loadings = T, loadings.colour =  "black", loadings.label = T,
         loadings.label.colour = "black")

The data comes from an online file from a class I'm taking, so you could just copy this if you have the ggplot2 and ggfortify packages installed. Graphs below.

R plot of what I want ggplot to look like

What ggplot actually looks like

Edit: Adding reproducible code below.

iris.res <- 
  iris %>% 
  select(Sepal.Length:Petal.Width) %>% 
  as.matrix(.) %>% 
  prcomp(., scale = F)

autoplot(iris.res, data = iris, size = 4, col = "Species", shape = "Species", 
         x = 1, y = 2, #components 1 and 2
         loadings = T, loadings.colour = "grey50", loadings.label = T, 
         loadings.label.colour = "grey50", loadings.label.repel = T) + #loadings are arrows
  geom_vline(xintercept = 0, lty = 2) +
  geom_hline(yintercept = 0, lty = 2) +
  theme(aspect.ratio = 1) +
  theme_bw()
1
Your link to csv does not exist so I couldn't reproduce it. However, by looking at your result, it seems like your text labels are very messy so I'd suggest installing ggrepel package and then run your code again.Yuan Tang
Also, you can pass additional arguments in autoplot which will get passed to ggbiplot in ggfortify. Take a look at ?ggbiplot to see if you can find any arguments you need, e.g. loadings.label.size.Yuan Tang
Sorry about the link, it seems the coarse was chnaged and so were the URLs. Thanks for the tip!ginkgotrees
@YuanTang Hi, thanks for your tip! However, there still not solutions for changing the arrow size and the line width of the loadings with ggfortify.pengchy
@pengchy The answer might just be no. There is a scale function for autoplot, but it scales the entire plot by a chosen factor rather than just the arrow tips.ginkgotrees

1 Answers

4
votes

This answer is probably long after the OP needs it, but I'm offering it because I have been wrestling with the same issue for a while, and maybe I can save someone else the same effort.

# Load data
iris <- data.frame(iris)

# Do PCA
PCA <- prcomp(iris[,1:4])

# Extract PC axes for plotting
PCAvalues <- data.frame(Species = iris$Species, PCA$x)

# Extract loadings of the variables
PCAloadings <- data.frame(Variables = rownames(PCA$rotation), PCA$rotation)

    # Plot
ggplot(PCAvalues, aes(x = PC1, y = PC2, colour = Species)) +
  geom_segment(data = PCAloadings, aes(x = 0, y = 0, xend = (PC1*5),
     yend = (PC2*5)), arrow = arrow(length = unit(1/2, "picas")),
     color = "black") +
  geom_point(size = 3) +
  annotate("text", x = (PCAloadings$PC1*5), y = (PCAloadings$PC2*5),
     label = PCAloadings$Variables)

enter image description here

In order to increase the arrow length, multiply the loadings for the xend and yend in the geom_segment call. With a bit of trial and effort, can work out what number to use.

To place the labels in the correct place, multiply the PC axes by the same value in the annotate call.