1
votes

I would like to know how to pass a user-defined function in a data.table.

I created the following code using data.table to calculate % of responses 'b' out of all valid responses ('a' or 'b') by two groups; grp1 and grp2:

The data (with a warning message):

library(data.table)
dt = data.table(rep(c("I", "II", "III", "IV")), rep(c("A", "B", "C")), 
                rep(c("a", "a", "b", "b", "b"), 20))
colnames(dt) = c("grp1", "grp2", "Q1")

The code to calculate % respondents:

dt[, sum(Q1 %in% "b")/sum(!is.na(Q1))*100, by = grp1:grp2][order(grp1, grp2)]

This produces what I need (thanks @Frank your help at Calculate % respondents by more than one group for a survey data):

    grp1 grp2       V1
 1:    I    A 55.55556
 2:    I    B 62.50000
 3:    I    C 62.50000
 4:   II    A 62.50000
 5:   II    B 55.55556
 6:   II    C 62.50000
 7:  III    A 50.00000
 8:  III    B 62.50000
 9:  III    C 66.66667
10:   IV    A 66.66667
11:   IV    B 62.50000
12:   IV    C 50.00000

What I would like to do is to create a function and use it to calculate the equivalent set of values for 50 other items. I created the following function hoping to minimize the repetitive process;

test = function(question, groupA, groupB){
  dt[, sum(get(question) %in% "b")/sum(!is.na(get(question)))*100, by = eval((c(groupA, groupB)))][order(groupA, groupB)]
  }

test(question = "Q1", groupA = "grp1", groupB ="grp2")

However, this returns only the top row :

   grp1 grp2       V1
1:    I    A 55.55556

I've read other items on Stack Overflow (e.g. Using data.table i and j arguments in functions) and tried other codes but I haven't been able to find a way to get it work.

I'm new to R and would very much appreciate any feedback you may have.

1

1 Answers

1
votes

The issue is in the way you specify the by argument. Also we can use keyby instead of by, to do the sorting in one step:

test = function(question, groupA, groupB){
  dt[, sum(get(question) %in% "b") / sum(!is.na(get(question))) * 100, 
    keyby =  c(groupA, groupB)] 
}

ans = test(question = "Q1", groupA = "grp1", groupB ="grp2")
#   grp1  grp2       V1
# 1:   I     A 55.55556
# 2:   I     B 62.50000
# 3:   I     C 62.50000
# 4:  II     A 62.50000
# 5:  II     B 55.55556
# 6:  II     C 62.50000
# 7: III     A 50.00000
# 8: III     B 62.50000
# 9: III     C 66.66667
# 10:  IV     A 66.66667
# 11:  IV     B 62.50000
# 12:  IV     C 50.00000