1
votes

I am trying to create two graphs where the x-axis of both will be the Filename (see small example of data organization below).

For an individual Filename, there will be multiple occurrences of a particular phylum - each with a different M1 value.

    Phylum      M1   Filename
Acidobacteria 55.75 4461130.3
Acidobacteria 57.08 4461130.3
Acidobacteria 54.61 4461125.3
Acidobacteria 53.49 4461145.3
Acidobacteria 57.99 4461145.3
Acidobacteria 53.05 4461161.3
Acidobacteria 51.03 4461161.3
Acidobacteria 50.20 4461227.3
Acidobacteria 51.88 4461227.3

Plot 1: x-axis is the Filename instances. y-axis is the number of times each phylum occurs for each Filename (e.g. w/ Acidobacteria = 2 for Filename 4461145.3).

Plot 2: x-axis is the Filename instances. y-axis is the mean value of M1 for each phylum that occurs for each Filename (e.g. w/ Acidobacteria (n=2), mean_M1 = 55.74 for Filename 4461145.3).

The points on the graph should be colored by Phylum and for each Filename, the sum and mean values for each phylum should be in a vertical line. For the example given, there is only one Phylum name which makes the request a bit trivial, but there are over 30 unique Phylum in my dataset.

I can plot the raw M1 values for each phylum (it colors properly too) by filename, but I cannot quite figure out the nomenclature for getting the sum and the mean values of M1. I was trying to use lapply, but I cannot figure out how to incorporate it with ggplot.

FN = env.txt
myDF = read.csv(FN, header=TRUE, sep=' ')

f <- qplot(Filename, M1, data=myDF)
f + geom_point(aes(colour=factor(Phylum))) + theme(axis.text.x=element_text(angle=90, hjust=1))

e <- qplot(Filename, mean(M1), data=myDF)
e + geom_point(aes(colour=factor(Phylum))) + theme(axis.text.x=element_text(angle=90, hjust=1))

g <- ggplot(myDF, aes(Filename, M1))
g + geom_point(aes(colour=factor(Phylum))) + theme(axis.text.x=element_text(angle=90, hjust=1))

p <- ggplot(myDF, aes(Filename, mean(M1)))
p + geom_point() + facet_grid(. ~ Phylum) + theme(axis.text.x=element_text(angle=90, hjust=1))

q <- qplot(Filename, M1, data=myDF, fun.y='mean')
q + geom_point() + facet_grid(. ~ Phylum) + theme(axis.text.x=element_text(angle=90, hjust=1))

Images of attempts can be seen here: http://imgur.com/srmR1rO The first one is roughly the right idea, but instead of having all the values of M1, I would like the mean. I have not attempted the summation problem.

Assistance greatly appreciated.

1

1 Answers

3
votes

Why not create a dataframe with the M1 means and sums, called myDF.agg, using plyr?

library(plyr)
FN = env.txt
myDF = read.csv(FN, header=TRUE, sep=' ')
myDF.agg = ddply(myDF, .(Filename, Phylum), summarize, mean_M1 = mean(M1), sum_M1 = sum(M1))
e.mean <- qplot(Filename, mean_M1, data=myDF.agg)
e.mean + geom_point(aes(colour=factor(Phylum))) + theme(axis.text.x=element_text(angle=90,     hjust=1))

e.sum <- qplot(Filename, sum_M1, data=myDF.agg)
e.sum + geom_point(aes(colour=factor(Phylum))) + theme(axis.text.x=element_text(angle=90,     hjust=1))