1
votes

I have a problem since 2/3 days and I really hope someone would be able to help me. I have a dataframe with the number of bacteria species in different samples and I want to represent them in a stacked barplot (with ggplot()).

My dataframe is like this :

> head(Table_count)


       taxonomy
    1        Bacteria(100);Proteobacteria(99);Alphaproteobacteria(99);Rhodospirillales(99);Acetobacteraceae(99);Roseomonas(97);
    2               Bacteria(100);Actinobacteria(100);Actinobacteria(100);Actinomycetales(100);Micrococcaceae(95);unclassified;
    3     Bacteria(100);Proteobacteria(100);Gammaproteobacteria(100);Pseudomonadales(100);Moraxellaceae(100);Enhydrobacter(95);
    4                      Bacteria(100);Bacteroidetes(100);Cytophagia(100);Cytophagales(100);Cytophagaceae(100);Rudanella(93);
    5                    Bacteria(100);Firmicutes(100);Bacilli(100);Bacillales(100);Staphylococcaceae(100);Staphylococcus(100);
    6 Bacteria(100);Proteobacteria(100);Alphaproteobacteria(100);Sphingomonadales(100);Sphingomonadaceae(100);Sphingobium(100);

      door_in_1 door_in_2 faucet_handle_1 faucet_handle_2 sink_floor_1 sink_floor_2 soap_dispenser_1
    1      3891      4689            6593            4266         3376         3477             3094
    2        26        90              28               6           48           38               56
    3         0         5              11               1           23           17               17
    4         0         0               3               0           20           11                1
    5         0         0               2               0           18           11                1
    6         0         0               0               0           16            7                0

      stall_in_1 toilet_floor_1 toilet_floor_2 toilet_flush_handle_1
    1       3462           3727           4110                  4470
    2         63             54             47                    76
    3          8             31             27                    23
    4          0             18             15                     2
    5          0             18             14                     1
    6          0             11              7                     0

      toilet_flush_handle_2 toilet_seat_1 toilet_seat_2
    1                  4682          3657          3696
    2                    35            36             7
    3                    15            10             6
    4                     3             1             0
    5                     3             1             0
    6                     2             1             0

1 line = 1 specie (with the name in $taxonomy) and 1 column = 1 sample. The values written in the boxes correspond to the number of individuals of each species in each sample.

My goal is "simple"(if I can say so >.>), I want to represent them like this : barplot_schema , to have an overall view of the percentage that each species represents in the different samples.

I found a lot of ways to do a Stacked Barplot but any of them really worked with my data...

Anyways, thanks for reading and I hope someone will help me ~

 R version 3.6.0 (2019-04-26)

[Example of the dataframe]

structure(list(taxonomy = c("Bacteria(100);Proteobacteria(99);Alphaproteobacteria(99);Rhodospirillales(99);Acetobacteraceae(99);Roseomonas(97);", 
    "Bacteria(100);Actinobacteria(100);Actinobacteria(100);Actinomycetales(100);Micrococcaceae(95);unclassified;", 
    "Bacteria(100);Proteobacteria(100);Gammaproteobacteria(100);Pseudomonadales(100);Moraxellaceae(100);Enhydrobacter(95);", 
    "Bacteria(100);Bacteroidetes(100);Cytophagia(100);Cytophagales(100);Cytophagaceae(100);Rudanella(93);", 
    "Bacteria(100);Firmicutes(100);Bacilli(100);Bacillales(100);Staphylococcaceae(100);Staphylococcus(100);", 
    "Bacteria(100);Proteobacteria(100);Alphaproteobacteria(100);Sphingomonadales(100);Sphingomonadaceae(100);Sphingobium(100);"
), door_in_1 = c(3891L, 26L, 0L, 0L, 0L, 0L), door_in_2 = c(4689L, 
90L, 5L, 0L, 0L, 0L), faucet_handle_1 = c(6593L, 28L, 11L, 3L, 
2L, 0L), faucet_handle_2 = c(4266L, 6L, 1L, 0L, 0L, 0L), sink_floor_1 = c(3376L, 
48L, 23L, 20L, 18L, 16L), sink_floor_2 = c(3477L, 38L, 17L, 11L, 
11L, 7L), soap_dispenser_1 = c(3094L, 56L, 17L, 1L, 1L, 0L), 
    stall_in_1 = c(3462L, 63L, 8L, 0L, 0L, 0L), toilet_floor_1 = c(3727L, 
    54L, 31L, 18L, 18L, 11L), toilet_floor_2 = c(4110L, 47L, 
    27L, 15L, 14L, 7L), toilet_flush_handle_1 = c(4470L, 76L, 
    23L, 2L, 1L, 0L), toilet_flush_handle_2 = c(4682L, 35L, 15L, 
    3L, 3L, 2L), toilet_seat_1 = c(3657L, 36L, 10L, 1L, 1L, 1L
    ), toilet_seat_2 = c(3696L, 7L, 6L, 0L, 0L, 0L)), row.names = c(NA, 
6L), class = "data.frame")

`

1
Can you provide a sample of your data using dput? stackoverflow.com/questions/49994249/example-of-using-dputyusuzech
Yes, sorry I totally forgot. I fixed it :)Thrylia
Is this what you are looking for? stackoverflow.com/questions/9563368/… (looks like most of the bar for each source will be in specie 1?) what have you tried so far that didn't work?Ben

1 Answers

1
votes

Here is one possible way, but the stacked bar chart looks weird because the values in the location columns have such wide ranges. I am only choosing value>9 so the chart does not look as weird as it did when I included all of them.

For this example I named your data from dput as df and labeled your species 1 to 6 to correspond with the repeated 6 rows in taxonomy:

  library(reshape)
  new.df <- data.frame(species=rep(seq(1,6,1),14), 
       melt(df, id.vars=c("taxonomy"))) %>% select(-taxonomy)

  new.df %>% filter(value>9) %>% 
       ggplot(aes(x=variable,y=value,fill=as.factor(species))) + 
       geom_bar(position="fill",stat="identity") +
       theme(axis.text.x = element_text(angle=90, hjust=1))

The outcome is:

stacked bar plot

Hope this is helpful.