I am working on a gene expression data frame called expression
. My samples belong to different subgroups, indicated in the colname (i.e. all samples that contain "adk" in their colname belong to the same subgroup)
adk1 adk2 bas1 bas2 bas3 non1 ...
gene1 1.1 1.3 2.2 2.3 2.8 1.6
gene2 2.5 2.3 4.1 4.6 4.2 1.9
gene3 1.6 1.8 0.5 0.4 0.9 2.2
...
I already defined subsets using
adk <- expression[grepl('adk', names(expression))]
I then did a PCA on this data set using
pca = prcomp (t(expression), center = F, scale= F)
I now want to plot the PCs I got from the PCA against each other in a PCA biplot. For this, I want all samples that belong to the same subgroup to have the same color (so e.g. all "adk" samples should be green, all "bas" samples should be red and all "non" samples should be blue). I tried to use the color
argument of the autoplot
function from ggfortify, but I wansn't able to make it use my defined subsets.
I would be glad if someone could help me with this! Thanks :)
Edit: I'd like to give you an example of what I want to do, using the USArrests dataset:
head(USArrests)
Murder Assault UrbanPop Rape
Alabama 13.2 236 58 21.2
Alaska 10.0 263 48 44.5
Arizona 8.1 294 80 31.0
Arkansas 8.8 190 50 19.5
California 9.0 276 91 40.6
Colorado 7.9 204 78 38.7
## Doing a PCA on the USArrests dataset
US.pca = prcomp(t(USArrests), center = F, scale = F)
## Now I can create a PCA biplot of PC1 and PC2 using the autoplot function (since I have ggfortify installed)
biplot1 = autoplot(US.pca,data=t(USArrests), x=1, y=2)
I want all samples that contain an "e" in their colname (in this case "Murder" and "Rape") to be the same color. The "UrbanPop" and the "Assault" sample should be an individual color as well. I hope this makes things a little clearer :)
P.S. I run R in the latest version of RStudio on Windows 10