0
votes

I need to create some box plots showing the abundance of some bacterial taxa in different samples. My data looks like:

  my.data <- "Taxon 06.TO.VG    21.TO.V 02.TO.VG    41.TO.VG    30.TO.V 04.BA.V 34.TO.VG    01.BA.V 28.TO.VG    18.TO.O 44.TO.V 08.BA.O 07.BA.O 06.BA.V 11.TO.V 06.BA.VG    07.BA.VG    05.BA.VG    07.BA.V 05.BA.V 06.BA.O 02.BA.O 04.BA.O 01.BA.O 05.BA.O 03.BA.O 02.BA.VG    03.BA.V 02.BA.V 04.BA.VG    03.BA.VG    01.BA.VG    15.TO.O 31.TO.O 09.TO.O 27.TO.V 42.TO.VG    08.TO.VG    16.TO.O 07.TO.V 13.TO.O 32.TO.V 29.TO.VG    10.TO.V 25.TO.V 05.TO.VG    20.TO.O 19.TO.V 17.TO.O 35.TO.V 43.TO.O 24.TO.V 26.TO.VG    01.TO.VG    37.TO.O 04.TO.VG    33.TO.O 39.TO.VG    14.TO.O 12.TO.O 38.TO.VG    22.TO.O
Bacteroides 0.072745558 0.011789182 0.028956894 0.059031877 0.097387173 0.086673889 0.432662192 0.060246679 0.269535674 0.152713335 0.014511873 0.063421323 0.091253905 0.139856373 0.013677012 0.200847907 0.180712032 0.21332737  0.031756181 0.272166702 0.019861211 0.133804422 0.168692685 0.100862392 0.152431791 0.104702194 0.119352089 0.410334347 0.024104844 0.0493905   0.068065382 0.047854785 0.011860175 0.168986083 0.015748031 0.407974482 0.264409881 0.250364431 0.330547112 0.536443695 0.578045113 0.400459167 0.204446209 0.357879234 0.242751388 0.488863722 0.521495803 0.001852281 0.045638126 0.503566932 0.069072806 0.171181339 0.183629007 0.371751412 0.385231317 0.023690205 0.255697356 0.104054054 0.242741552 0.043973941 0.221033868 0.004587156
Prevotella  0.073080791 0.302011096 0.586048042 0.487603306 0.290973872 0.014897075 0   0.333254269 0.029445074 0   0.153034301 0.002399726 0.025658188 0.090664273 0.440294582 0.100688924 0   0   0   0   0   0.000227946 0.093623374 0   0.000197707 0.115987461 0.076442171 0   0.047507606 0.000210172 0.000243962 0.042079208 0.52184769  0   0.394750656 0   0   0.235787172 0   0.000936856 0.000300752 0   0.051607781 0   0   0   0.002289494 0.735586941 0.023828756 0   0.011200996 0   0.046374105 0   0.00044484  0.085421412 0.000455789 0.306756757 0   0.11970684  0.008912656 0.371559633"

I'm wandering bout using ggplot2 to do to do the box plot, but I'm not sure about how the data have to be formatted.... I tried this:

df <- read.csv("my.data", header=T) ggplot(data = df, aes(x=variable, y=value)) + geom_boxplot(aes(fill=Taxon))

but it gave me an error saying that the variable was not found... Anyone can help me?

Many thanks Francesca

1
The error is pretty informative. Are your x values in your data called variable? If they are not then R will tell you it cannot find them... Also your data looks wide it needs to be long. Posting the result of dput(my.data) is much more productive than the format you have given your data in.user1317221_G
Have a look at this tutorialHenrik

1 Answers

1
votes

An quick example of how to format your data:

categs = sample(LETTERS[1:3], 120, TRUE)
y = c(rnorm(40), rnorm(40, 3, 2), rnorm(40, 5, 3))

# example dataset
dados = data.frame(categs, y)

require(ggplot2)
ggplot(dados) + geom_boxplot(aes(x = categs, y = y))

#  categs          y
#1      B  0.7392673
#2      B -0.1694076
#3      A -2.3804024
#4      B  0.5999949
#5      A  0.5816400
#6      A  2.1263669

See also http://ggplot2.org/