3
votes

I'm doing PCA and I would like to plot first principal component vs second in R:

pca<-princomp(~.,data=data, na.action=na.omit
plot(pca$scores[,1],pca$scores[,2])

or maybe several principal components:

pairs(pca$scores[,1:4])

however the points are black. How do I appropriately add color to the graphs? How many colors do I need? One for each principal component I am plotting? Or one for each row in my data matrix?

Thanks

EDIT:

my data looks like this:

> data[1:4,1:4]
                          patient1                     patient2                     patient3                     patient4
2'-PDE                    0.0153750                    0.4669375                   -0.0295625                    0.7919375
7A5                       2.4105000                    0.3635000                    1.8550000                    1.4080000
A1BG                      0.9493333                    0.2798333                    0.7486667                    0.7500000
A2M                       0.2420000                    1.0385000                    1.1605000                    1.6777500

So would this be appropriate:

plot(pca$scores[,1:4], pch=20, col=rainbow(dim(data)[1]))
1
One would normally color the points by membership of the samples in some groups (republicans/democrats, controls/exptl, old/new etc). You will need to make a vector of colors to go with your plot command: plot(pca$scores[,1],pca$scores[,2]), col = c("red", "blue"...)) Give us an idea of what you are doing and we can figure out an easy way to generate that vector. So your last idea is correct: one for each row in your data set.Bryan Hanson
thanks I think I can take it from there. I have a data matrix of patients and their gene expression for about ~1000 genes. So maybe color each gene differently?bdeonovic
1000 distinct colors? You have a overly optimistic view of the capacities of the human visual system. At any rate, see also stackoverflow.com/questions/13936051/…IRTFM
DWin's link is brilliant. However, actual answer (with illustration) is here stackoverflow.com/a/13938639/181638Assad Ebrahim
If you have additional categorical information about your patients & treatments that led to the microarray data you have, you might want to look at the lmdme package. It's a bit more than your original question calls for - it puts the focus on your exptl design instead of the particular genes, but might be a helpful, different way of visualizing your results.Bryan Hanson

1 Answers

3
votes

Here are some example plots of PCA. Taken from the here.

z1 <- rnorm(10000, mean=1, sd=1); z2 <- rnorm(10000, mean=3, sd=3); z3 <- rnorm(10000, mean=5, sd=5); z4 <- rnorm(10000, mean=7, sd=7); z5 <- rnorm(10000, mean=9, sd=9); mydata <- matrix(c(z1, z2, z3, z4, z5), 2500, 20, byrow=T, dimnames=list(paste("R", 1:2500, sep=""), paste("C", 1:20, sep=""))) 

summary(pca) 
summary(pca)$importance[, 1:6] 

x11(height=6, width=12, pointsize=12); par(mfrow=c(1,2)) 

mycolors <- c("red", "green", "blue", "magenta", "black") # Define plotting colors. plot(pca$x, pch=20, col=mycolors[sort(rep(1:5, 500))]) 

plot(pca$x, type="n"); text(pca$x, rownames(pca$x), cex=0.8, col=mycolors[sort(rep(1:5, 500))]) 

You can use pairs

pairs(pca$x[,1:5], col = mycolors) 

Plots a scatter plot for the first two principal components plus the corresponding eigen vectors that are stored in pca$rotation.

library(scatterplot3d) 
scatterplot3d(pca$x[,1:3], pch=20, color=mycolors[sort(rep(1:5, 500))]) 

Same as above, but plots the first three principal components in 3D scatter plot.

library(rgl); rgl.open(); offset <- 50; par3d(windowRect=c(offset, offset, 640+offset, 640+offset)); rm(offset); rgl.clear(); rgl.viewpoint(theta=45, phi=30, fov=60, zoom=1); spheres3d(pca$x[,1], pca$x[,2], pca$x[,3], radius=0.3, color=mycolors, alpha=1, shininess=20); aspect3d(1, 1, 1); axes3d(col='black'); title3d("", "", "PC1", "PC2", "PC3", col='black'); bg3d("

The later creates an interactive 3D scatter plot with Open GL. The rgl library needs to be installed for this. To save a snapshot of the graph, one can use the command rgl.snapshot("test.png").

require(GGally)
ggpairs(pca$x[,1:5])