0
votes

I have a data frame in R that holds PCA data and looks roughly like this:

obsnames PC1 PC2 PC3
one 2.46 2.57 1.366962e-15
two -3.47 0.84 3.053113e-16
three 1.01 -3.40 7.077672e-16

You could load the exact variable with this:

structure(list(obsnames = c("one", "two", "three"), PC1 = c(2.46310908247957, 
-3.46877162330214, 1.00566254082257), PC2 = c(2.56831624877025, 
0.836571395923965, -3.40488764469422), PC3 = c(1.36696209906972e-15, 
3.05311331771918e-16, 7.07767178198537e-16), `Sample Size` = c(48L, 
74L, 52L)), row.names = c("one", "two", "three"), class = "data.frame")

Now. I'm trying to plot this PCA with ggplo2 geom_point by using only those shapes that allow for the "fill" aesthetic (21-25 iirc). However, I'm having trouble creating the legend such that it matches both the shape and the color displayed in the plot. I gave up trying to figure it out myself, and I find it very strange given that I'm feeding it pretty much all manually. This is my plotting line:

len <- length(pca_data$obsnames)
ggplot(pca_data, aes_string(x=x, y=y)) + geom_point(shape = rep_len(c(21, 22, 23, 24, 25), length.out = len), color = "black", size = 3, aes(fill=obsnames)) + theme_bw() + theme(legend.position="right") + 
xlab(label_x) + ylab(label_y) + ggtitle(main) + 
theme(plot.title = element_text(hjust = 0, face="bold")) + geom_hline(aes(0), size=.2,yintercept=0) + 
geom_vline(aes(0), size=.2,xintercept=0) + coord_equal() + 
geom_text(data=datapc, aes(x=v1, y=v2, label=varnames), size = 3, vjust=0.3, color="grey", fontface="bold") + 
geom_segment(data=datapc, aes(x=0, y=0, xend=v1, yend=v2), color="grey", linetype="dotted") + 
scale_fill_manual(values = rep_len(c("red", "blue", "green", "orange", "yellow", "purple", "pink", "light blue", "white", "black", "gold"), length.out = len)) + 
guides(fill=guide_legend(override.aes=list(shape=rep_len(c(21, 22, 23, 24, 25), length.out = len))))

Which outputs the following plot: pca image

As you can see. The legend shows "two" as a green diamond, when in reality it should be the green square. Also, when I happen to use the same amount of points (obsnames) than shapes in my vector of shapes: c(21, 22, 23, 24, 25); that is, 5, then the problem doesn't appear. But I really don't see what I'm doing wrong...