I have a data frame of taxonomic variables that looks like this (but longer).
taxTest <- structure(list(Kingdom = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Bacteria", class = "factor"),
Phylum = structure(c(2L, 1L, 1L, 1L, 1L), .Label = c("Bacteroidetes",
"Proteobacteria"), class = "factor"), Class = structure(c(2L,
1L, 1L, 1L, 1L), .Label = c("Bacteroidia", "Gammaproteobacteria"
), class = "factor"), Order = structure(c(2L, 1L, 1L, 1L,
1L), .Label = c("Bacteroidales", "Enterobacteriales"), class = "factor"),
Family = structure(c(2L, 1L, 3L, 1L, 3L), .Label = c("Bacteroidaceae",
"Enterobacteriaceae", "Prevotellaceae"), class = "factor"),
Genus = structure(c(2L, 1L, 3L, 1L, 3L), .Label = c("Bacteroides",
"Escherichia/Shigella", "Prevotella"), class = "factor"),
Genus.y = structure(c(NA, 1L, 2L, 1L, 2L), .Label = c("Bacteroides",
"Prevotella"), class = "factor"), Species = structure(c(1L,
4L, 2L, 5L, 3L), .Label = c("albertii/boydii/coli/coli,/dysenteriae/enterica/fergusonii/flexneri/sonnei/vulneris",
"copri", "disiens", "dorei", "dorei/vulgatus"), class = "factor")), .Names = c("Kingdom",
"Phylum", "Class", "Order", "Family", "Genus", "Genus.y", "Species"
), row.names = c("tax1", "tax2", "tax3", "tax4", "tax5"), class = "data.frame")
I want to come up with a short taxonomic name from this data and so I run a function that is slightly more complicated than this one (it has to deal with deal with NA data in a bunch of these taxonomic levels), but fails in the same way.
library(dplyr)
tag_taxon <- function(tvdf){
species <- tvdf %>% dplyr::select(Species) %>% unlist
genus2 <- tvdf %>% dplyr::select(Genus, Genus.y) %>% unlist
genus <- genus2 %>% na.omit %>% .[1]
#genus <- tvdf %>% dplyr::select(Genus) %>% unlist
out <- paste(genus, species)
out }
If I run this function against each row of the table, I get an answer that I am expecting, a Genus and species name.
for(i in 1:5){
print(taxTest %>% .[i,] %>% tag_taxon)
}
[1] "Escherichia/Shigella albertii/boydii/coli/coli,/dysenteriae/enterica/fergusonii/flexneri/sonnei/vulneris"
[1] "Bacteroides dorei"
[1] "Prevotella copri"
[1] "Bacteroides dorei/vulgatus"
[1] "Prevotella disiens"
I feel like I should be able to use dplyr to apply this function over each row of the data frame. Unfortunately, this returns counter-intuitive results.
taxTest %>% rowwise %>% tag_taxon
'Escherichia/Shigella albertii/boydii/coli/coli,/dysenteriae/enterica/fergusonii/flexneri/sonnei/vulneris' 'Escherichia/Shigella dorei' 'Escherichia/Shigella copri' 'Escherichia/Shigella dorei/vulgatus' 'Escherichia/Shigella disiens'
I thought maybe the apply function might also work here, but this just outright fails with a cryptic error message.
apply(taxTest, 1, tag_taxon)
Error in UseMethod("select_"): no applicable method for 'select_' applied to an object of class "character" Traceback:
- apply(taxTest, 1, tag_taxon)
- FUN(newX[, i], ...)
- tvdf %>% dplyr::select(Species) %>% unlist # at line 4 of file
- withVisible(eval(quote(
_fseq
(_lhs
)), env, env))- eval(quote(
_fseq
(_lhs
)), env, env)- eval(quote(
_fseq
(_lhs
)), env, env)_fseq
(_lhs
)- freduce(value,
_function_list
)- function_list[i]
- dplyr::select(., Species)
- select.default(., Species)
- select_(.data, .dots = compat_as_lazy_dots(...))
Any ideas about what is going on here? I can totally solve this problem with a for loop, but I'd rather use dplyr if I can.
Thanks!
Edit: One more thing! I forgot to mention in my original post that if one un-comments the #genus <- tvdf %>% dplyr::select(Genus) %>% unlist
line (that is, I don't try to append the species information to the genus information) the plyr function gives the expected results.
taxTest %>% select(Species) %>% unlist
, you can dotaxTest %>% pull(Species)
. – eipi10dplyr
, there's no need to usedplyr::select
instead of justselect
. – eipi10