2
votes

I need to build a barplot of my data, showing bacterial relative abundance in different samples (each column should sum to 1 in the complete dataset).

A subset of my data:

> mydata


Taxon   CD6 CD1 CD12
Actinomycetaceae;g__Actinomyces 0.031960309 0.066683743 0.045638509
Coriobacteriaceae;g__Atopobium  0.018691589 0.003244536 0.00447774
Corynebacteriaceae;g__Corynebacterium   0.001846083 0.006403689 0.000516662
Micrococcaceae;g__Rothia    0.001730703 0.000426913 0.001894429
Porphyromonadaceae;g__Porphyromonas 0.073497173 0.065915301 0.175406872

What I'd like to have is a bar for each sample (CD6, CD1, CD12), where the y values are the relative abundance of bacterial species (the Taxon column).

I think (but I'm not sure) my data format is not right to do the plot, since I don't have a variable to group by like in the examples I found...

ggplot(data) + geom_bar(aes(x=revision, y=added), stat="identity", fill="white", colour="black")

Is there a way to order my data making them right as input to this code? Or how can I modify it? Thanks!

1

1 Answers

3
votes

Do you want something like this?

# sample data
df <- read.table(header=T, sep=" ", text="
Taxon CD6 CD1 CD12
Actinomycetaceae;g__Actinomyces 0.031960309 0.066683743 0.045638509
Coriobacteriaceae;g__Atopobium 0.018691589 0.003244536 0.00447774
Corynebacteriaceae;g__Corynebacterium 0.001846083 0.006403689 0.000516662
Micrococcaceae;g__Rothia 0.001730703 0.000426913 0.001894429
Porphyromonadaceae;g__Porphyromonas 0.073497173 0.065915301 0.175406872")

# convert wide data format to long format
require(reshape2)
df.long <- melt(df, id.vars="Taxon",
                measure.vars=grep("CD\\d+", names(df), val=T),
                variable.name="sample",
                value.name="value")

# calculate proportions
require(plyr)
df.long <- ddply(df.long, .(sample), transform, value=value/sum(value))

# order samples by id     
df.long$sample <- reorder(df.long$sample, as.numeric(sub("CD", "", df.long$sample)))

# plot using ggplot
require(ggplot2)
ggplot(df.long, aes(x=sample, y=value, fill=Taxon)) + 
  geom_bar(stat="identity") +
  scale_fill_manual(values=scales::hue_pal(h = c(0, 360) + 15, # add manual colors
                                           c = 100, 
                                           l = 65, 
                                           h.start = 0, 
                                           direction = 1)(length(levels(df$Taxon))))

enter image description here