3
votes

I am trying to write a function that uses dplyr to count up all unique values of z. My function works fine when I have the variable actually named z. However, if the variable is named x, I get an error (below code).

test.data<-data.frame(y=c(1:10),
                  x=c(letters[1:10]))
test.data$x<-as.character(test.data$x)
obsfunction<-function(z,y,data){
filter_(data,
          !is.na(deparse(substitute(y))))%>%
    distinct_(., deparse(substitute(z)))%>% #the line that breaks it
    count_(.)
}
obsfunction(z=x,y,data=test.data)

So, the above code doesn't work and gives this error:

 >Error in eval(substitute(expr), envir, enclos) : unknown column 'z'

Changing z to x in the function (or renaming x as z) makes it work, but I don't want to have to rename everything, especially considering y works with different names.

I have tried lazyeval::interp and quote() per the vignette, this question, and this question.

distinct_(lazyeval::interp(as.name(z)))%>%
>Error in as.name(z) : object 'x' not found 

distinct_(quote(z))%>%
>Error in eval(substitute(expr), envir, enclos) : unknown column 'z' 

What am I missing? How do I get z to accept x as the column name?

3

3 Answers

3
votes

as dplyr standard evaluation understand strings, I tried the following code and with additional test data, it seems work. I first extracted variable name and then constructed expressions using character strings:

test.data<-data.frame(y=c(1:10),
                      x=c(letters[1:10]))
test.data$x<-as.character(test.data$x)

f <- function(z, y, data){
    z <- deparse(substitute(z))
    y <- deparse(substitute(y))
    res <- data %>% filter_(
        paste('!is.na(', y, ')', sep = '')) %>%
        distinct_(z) %>%
        count_(.)
}


x <- f(z = x, y, test.data)
# # A tibble: 1 × 1
#       n
# <int>
# 1    10



test.data <- data.frame(
    y=c(1:4, NA, NA, 7:10),
    x=c(letters[c(1:8, 8, 8)]),
    stringsAsFactors = F)

x <- f(z = x, y, test.data)
# # A tibble: 1 × 1
#       n
# <int>
# 1     6
2
votes

You can use match.call to capture function arguments and convert them to characters before passing to the dplyr SE functions:

obsfunction<-function(z, y, data){
    cl = match.call()
    y = as.character(cl['y'])
    z = as.character(cl['z'])

    data %>% filter_(paste('!is.na(', y, ')', sep = '')) %>%
             distinct_(z) %>%
             count_(.)
}

obsfunction(z = x, y = y, data = test.data)

# A tibble: 1 × 1
#      n
#  <int>
#1    10

obsfunction(x, y, test.data)

# A tibble: 1 × 1
#      n
#  <int>
#1    10
1
votes

Another lazyeval/dplyr variation where the variables are passed as formulas,and f_interp substitutes uq(x) with the formula passed to it, similar to deparse(substitute(x))

library(dplyr)
library(lazyeval)

test.data<-data.frame(y=c(1:10),
                  x=c(letters[1:10]))
test.data$x<-as.character(test.data$x)


obsfunction<-function(z, y, data){
  data %>% filter_(f_interp(~!is.na(uq(y)))) %>%
    distinct_(f_interp(~uq(z))) %>% count()
}

obsfunction(z=~x,~y,data=test.data)

 #A tibble: 1 × 1
 #     n
 #  <int>
 #1    10

test.data.NA <- data.frame(
  y=c(1:4, NA, NA, 7:10),
  x=c(letters[c(1:8, 8, 8)]),
  stringsAsFactors = FALSE)


obsfunction(z=~x,~y,data=test.data.NA)
 # # A tibble: 1 × 1
 #        n
 #      <int>
 # 1      6