1
votes

I don't have experience on functions in R. I'm trying to build one that calculates the mean by a target variable (in my example: funded_final).

My data:

residential_status  funded_final
Living with parents 0
Rent                0
Rent                0
Own                 1
Own                 0
Own                 0
Rent                0
Rent                0
Rent                0
Living with parents 0
Rent                0
Rent                0
Rent                1

When I do this outside the function works great

test2 %>% group_by(residential_status) %>% 
summarise(tar_average = round((mean(funded_final, na.rm=TRUE))*100,2),N =     n()) %>% arrange(desc(tar_average)) %>% mutate(Perc = round((N/sum(N))*100,2),Cum_Perc = cumsum(Perc))%>% print(n = nrow(.))

The results:

 residential_status tar_average     N  Perc Cum_Perc
           <fctr>       <dbl> <int> <dbl>    <dbl>
1                 Own       33.33     3 23.08    23.08
2                Rent       12.50     8 61.54    84.62
3 Living with parents        0.00     2 15.38   100.00

When I use the function, I just get the total average:

 group.by.func <- function(dataframe,target){ dataframe %>%group_by(residential_status) %>% 
summarise(tar_average = round((mean(target, na.rm=TRUE))*100,2),N = n()) %>%
arrange(desc(tar_average)) %>%
mutate(Perc = round((N/sum(N))*100,2),Cum_Perc = cumsum(Perc))%>%
print(n = nrow(.))}
group.by.func(test2,test2$funded_final)

Results:

residential_status tar_average     N  Perc Cum_Perc
           <fctr>       <dbl> <int> <dbl>    <dbl>
1 Living with parents       15.38     2 15.38    15.38
2                 Own       15.38     3 23.08    38.46
3                Rent       15.38     8 61.54   100.00

Thanks in advance!

1

1 Answers

1
votes

The problem is that dplyr::summarise uses non-standard evaluation and expects the names of the columns as unquoted strings. In your case, the variable target is not a column name but a vector containing the values of the column. The function has no way of associating the vector with the data.frame. Therefore, the grouping does not apply to the vector target. In each evaluation of the grouped data.frame, the mean is taken over the entire vector target.

You could solve it by passing the column name as a string and using the 'standard evaluation' version of dplyr::summarise:

group.by.func <- function(dataframe, target){ 
    dataframe %>% group_by(residential_status) %>% 
            summarise_(.dots = list(
                            tar_average = paste0("round((mean(", target,", na.rm=TRUE))*100,2)"), 
                        N = "n()")) %>%
        arrange(desc(tar_average)) %>%
        mutate(Perc = round((N/sum(N))*100,2),Cum_Perc = cumsum(Perc))%>%
        print(n = nrow(.))
}
group.by.func(test2,"funded_final")

Results:

# A tibble: 3 × 5
   residential_status tar_average     N  Perc Cum_Perc
               <fctr>       <dbl> <int> <dbl>    <dbl>
1                 Own       33.33     3 23.08    23.08
2                Rent       12.50     8 61.54    84.62
3 Living with parents        0.00     2 15.38   100.00