1
votes

I am new to R and new to this community.

I have to deal with a lot of data at the moment and I try to make my life easier, so I want to create a pipeline, where I can upload a file and then get plots, pie charts, statistical analyses, and PCA as a result.

Within my data sets I have 8 re-ocurring specific categorical variables. Lets call them A-H. So in order to save time, I would like to assign a specific colour to a specific variable: eg. A = blue, B = white etc.

plot(sample.tsv$Annotation, col = c("blue", "white", "light blue", "green", "purple", "red", "black", "yellow"),

My function at the moment looks like the one above, which works great for the first dataset, but if in my next dataset B is not there (which might be the case) of course the colours get mixed up. Is there an easy solution I have missed? I have looked for 3 hours already and I cannot find anything, which really helped on the websides. Thank you in advance!

1

1 Answers

1
votes

Using match we can create a map that associates a different color to each variable name.
Here is an example that should clarify the idea.
We start considering two datasets which partially share the same variable names.

set.seed(1)
df1 <- as.data.frame(matrix(rnorm(100),ncol=5))
df2 <- as.data.frame(matrix(rnorm(100),ncol=5))
names(df2) <- c("V1","V2","V4","V6","V5")

names(df1)
[1] "V1" "V2" "V3" "V4" "V5"
names(df2)
[1] "V1" "V2" "V4" "V6" "V5"

Now we generate a vector with all variable names and a vector of associated colors

all.vars <- unique(c(names(df1),names(df2)))
all.cols <- rainbow(length(all.vars))

and then we match to each variable name a different color

( cols.df1 <- all.cols[match(names(df1), all.vars)] )
[1] "#FF0000FF" "#FFFF00FF" "#00FF00FF" "#00FFFFFF" "#0000FFFF"
( cols.df2 <- all.cols[match(names(df2), all.vars)] )
[1] "#FF0000FF" "#FFFF00FF" "#00FFFFFF" "#FF00FFFF" "#0000FFFF"

At last, we can use these colors for plotting the two datasets. Here we use ggplot2:

library(reshape)
df1m <- melt(cbind(df1,id=1:nrow(df1)),id.vars="id")
df2m <- melt(cbind(df2,id=1:nrow(df2)),id.vars="id")

library(ggplot2)
ggplot(aes(x=id, y=value, col=variable),data=df1m)+geom_line(lwd=1)+
  scale_color_manual(values=cols.df1)

enter image description here

ggplot(aes(x=id, y=value, col=variable),data=df2m)+geom_line(lwd=1)+
  scale_color_manual(values=cols.df2)

enter image description here