I want to calculate the relative proportion by group for every column - except the grouping column - of a data frame. However, this should be programmed once to be used with different data frames which will have a different number of columns with different names. Because I am relying heavily on dplyr in this project, I want to achive this with dplyr.
I have read this topic, regarding a similiar but less complex problem:
Use dynamic variable names in `dplyr`
and also vignette("programming", "dplyr")
but I am still not able to set the quotation correctly. I am really stuck at this point and like to have some advice of more experienced developers.
To reproduce the problem, I have set up a minimal example with a data frame with randomly created data columns and a grouping column.
library(dplyr)
library(stringi)
df <- setNames(as.data.frame(matrix(sample(1:10, 999, replace = T), 333, 3)),
stri_rand_strings(3, 10, pattern = "[A-Za-z]"))
group <- c("group1","group2","group3")
df <- cbind(df, group)
The following function should achive two things:
- calculate the sum of every column in the data frame by group
- calculate the relative proportions of every column in the data frame by group
propsum <- function(df, expr){
expr_quo <- enquo(expr)
sum <- paste(quo_name(expr), "sum", sep = ".")
prop <- paste(quo_name(expr), "prop", sep = ".")
df %>%
group_by(., group) %>%
mutate(., !! sum := sum(!! expr_quo),
!! prop := expr / !! sum * 100) -> df
return(df)
}
for(i in length(df)-1){
propsum(df, names(df)[i]) -> df_new
}
The expected result is a data frame with the initial columns, the sums by group for every initial column and the relative proportions for every initial column by group. So in the example, the data frame should have 10 columns (1 goruping column, 3 initial data columns, 3 columns with sums by group, 3 columns with relative proportions by group).
However, I am getting the following error:
Error in sum(~names(df)[i]) : invalid 'type' (character) of argument
In the vignette, the code example for a similar task ist:
my_mutate <- function(df, expr) {
expr <- enquo(expr)
mean_name <- paste0("mean_", quo_name(expr))
sum_name <- paste0("sum_", quo_name(expr))
mutate(df,
!! mean_name := mean(!! expr),
!! sum_name := sum(!! expr)
)
}
my_mutate(df, a)
#> # A tibble: 5 x 6
#> g1 g2 a b mean_a sum_a
#> <dbl> <dbl> <int> <int> <dbl> <int>
#> 1 1 1 5 4 3 15
#> 2 1 2 3 2 3 15
#> 3 2 1 4 1 3 15
#> 4 2 2 1 3 3 15
#> # … with 1 more row
I tried a lot of different things as of now, but I am not able to get the RHS to use the correct column. What am I doing wrong?