I am trying to use the dplyr package but I am facing a problem with handling variable.
Let say I have a simplify dataframe
my.data <- as.data.frame(matrix(NA), ncol=4, nrow=6)
my.data <- as.data.frame(cbind(c("d6", "d7", "d8", "d9", "da", "db"), c(rep("C200", 2), rep("C400", 4)), c(rep("a",5), "b"), c("c", rep("a", 5))))
colnames(my.data) <- c("snp", "gene", "ind1", "ind2")
I first count the number of snp per gene with the group_by:
new.data <- my.data %>% group_by(gene) %>% mutate(count = n())
But then I want to get the string occurence as percentage by gene for each individual column:
new.data %>% group_by(gene) %>% filter(grepl("a", ind1)) %>% dplyr::mutate(perc.a.ind1 = n()/count*100)
new.data %>% group_by(gene) %>% filter(grepl("a", ind2)) %>% dplyr::mutate(perc.a.ind2 = n()/count*100)
and it is working fine. The thing is that I have many individuals and I need to automatize it. So I create a vector of names and run my function within a for loop (I know loop is not best, I would be happy to upgrade to an apply version or something else)
ind.vec <- colnames(my.data[,3:4])
for (i in 1:length(ind.vec){
new.data %>% group_by(gene) %>% filter(grepl("a", ind.vec[i])) %>% mutate(percent = n()/count*100)
}
I ended up with an empty tibble, just like none element of my ind.vec is recognized.
I read the vignette https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html, that lets me think I have identified the problem, but I am far from understanding it and unable to make it worked with my data.
I made some trial with
ind.vec <- quote(colnames(my.data[,3:4]))
new.data %>% group_by(gene) %>% filter(grepl("a", !!(ind.vec[i]))) %>% mutate(percent = n()/count*100)
How can I make the vector element recognized by dplyr ?
May you help please?