So i have a load of data which I have sampled as an example below:
Sequence Abundance Length
CAGTG 3 25
CGCTG 82 23
GGGAC 4 25
CTATC 16 23
CTTGA 14 25
CAAGG 9 24
GTAAT 5 24
ACGAA 32 22
TCGGA 10 22
TAGGC 30 21
TGCCG 25 21
TCCGG 2 21
CGCCT 22 24
TTGGC 4 22
ATTCC 4 23
I'm only showing the first 4 words of each sequence here, but in reality they are "Length" long. I am looking at the abundances of sequences for each size class that I have here. In addition, I want to visualise the proportion of abundance that a particular sequence represents within its size class. Currently, I can make a stacked bar graph like this:
ggplot(tab, aes(x=Length, y=Abundance, fill=Sequence))
+ geom_bar(stat='identity')
+ opts(legend.position="none")
This is fine for a small data set like this, but I have about 1.7 million rows in my actual data set. It looks very colourful and I can see that particular sequences hold a majority abundance in one size class but it is very messy.
I would like to be able to order the coloured stacked bars for each size by that sequence's abundance. i.e. the bars with the highest abundance within their stack are at the bottom of each stack and the bars with the lowest abundance are at the top. It should look a lot more presentable that way.
Any ideas on how to do this in ggplot2? I know there's an "order" parameter in the aes() but I can't work out what it should do with data in the format that I have.
plyr::count
) and then use map the order aesthetic to the count. Note that the order aesthetic is buggy in the cran version and you may need to use the dev version. – hadley