I'm trying to create a rule to assign a specific color code for every unique string for graphing purposes in ggplot2 for different files. For example, if I have two tab delimited files, file1.txt and file2.txt that look like this:
file1.txt
Freq Seq
90 AAGTGT
3 AAGTGG
3 AAGTCC
2 AATTTT
2 TTTTTT
file2.txt
Freq Seq
91 AAGTGT
4 AAGTGG
2 AAGTCC
2 CCCCCC
1 TTTTTT
There are a total of 6 different colors that will be used for the above files for the 6 different sequences (AAGTGT, AAGTGG, AAGTCC, CCCCCC, TTTTTT, AATTTT). Across my many files, I have ~3000 colors that I've created a palette (pal
) for using
pal<-c(randomColor(count=2951))
Is there a method to ensure that all sequences among my many files maintain the ordered pairs of the strings and corresponding hex color codes (i.e. that all files that show the AAGTGT sequence will have the same hex color code for that string)? Of note, not all 3000 colors are represented in each file.
Thanks!