196
votes

EDIT: Hadley Wickham points out that I misspoke. R CMD check is throwing NOTES, not Warnings. I'm terribly sorry for the confusion. It was my oversight.

The short version

R CMD check throws this note every time I use sensible plot-creation syntax in ggplot2:

no visible binding for global variable [variable name]

I understand why R CMD check does that, but it seems to be criminalizing an entire vein of otherwise sensible syntax. I'm not sure what steps to take to get my package to pass R CMD check and get admitted to CRAN.

The background

Sascha Epskamp previously posted on essentially the same issue. The difference, I think, is that subset()'s manpage says it's designed for interactive use.

In my case, the issue is not over subset() but over a core feature of ggplot2: the data = argument.

An example of code I write that generates these notes

Here's a sub-function in my package that adds points to a plot:

JitteredResponsesByContrast <- function (data) {
  return(
    geom_point(
             aes(
               x = x.values, 
               y = y.values
             ),
             data     = data,
             position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
    )
  )
}

R CMD check, on parsing this code, will say

granovagg.contr : JitteredResponsesByContrast: no visible binding for
  global variable 'x.values'
granovagg.contr : JitteredResponsesByContrast: no visible binding for
  global variable 'y.values'

Why R CMD check is right

The check is technically correct. x.values and y.values

  • Aren't defined locally in the function JitteredResponsesByContrast()
  • Aren't pre-defined in the form x.values <- [something] either globally or in the caller.

Instead, they're variables within a dataframe that gets defined earlier and passed into the function JitteredResponsesByContrast().

Why ggplot2 makes it difficult to appease R CMD check

ggplot2 seems to encourage the use of a data argument. The data argument, presumably, is why this code will execute

library(ggplot2)
p <- ggplot(aes(x = hwy, y = cty), data = mpg)
p + geom_point()

but this code will produce an object-not-found error:

library(ggplot2)
hwy # a variable in the mpg dataset

Two work-arounds, and why I'm happy with neither

The NULLing out strategy

Matthew Dowle recommends setting the problematic variables to NULL first, which in my case would look like this:

JitteredResponsesByContrast <- function (data) {
  x.values <- y.values <- NULL # Setting the variables to NULL first
  return(
    geom_point(
             aes(
               x = x.values, 
               y = y.values
             ),
             data     = data,
             position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
    )
  )
}

I appreciate this solution, but I dislike it for three reasons.

  1. it serves no additional purpose beyond appeasing R CMD check.
  2. it doesn't reflect intent. It raises the expectation that the aes() call will see our now-NULL variables (it won't), while obscuring the real purpose (making R CMD check aware of variables it apparently wouldn't otherwise know were bound)
  3. The problems of 1 and 2 multiply because every time you write a function that returns a plot element, you have to add a confusing NULLing statement

The with() strategy

You can use with() to explicitly signal that the variables in question can be found inside some larger environment. In my case, using with() looks like this:

JitteredResponsesByContrast <- function (data) {
  with(data, {
      geom_point(
               aes(
                 x = x.values, 
                 y = y.values
               ),
               data     = data,
               position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
      )
    }
  )
}

This solution works. But, I don't like this solution because it doesn't even work the way I would expect it to. If with() were really solving the problem of pointing the interpreter to where the variables are, then I shouldn't even need the data = argument. But, with() doesn't work that way:

library(ggplot2)
p <- ggplot()
p <- p + with(mpg, geom_point(aes(x = hwy, y = cty)))
p # will generate an error saying `hwy` is not found

So, again, I think this solution has similar flaws to the NULLing strategy:

  1. I still have to go through every plot element function and wrap the logic in a with() call
  2. The with() call is misleading. I still need to supply a data = argument; all with() is doing is appeasing R CMD check.

Conclusion

The way I see it, there are three options I could take:

  1. Lobby CRAN to ignore the notes by arguing that they're "spurious" (pursuant to CRAN policy), and do that every time I submit a package
  2. Fix my code with one of two undesirable strategies (NULLing or with() blocks)
  3. Hum really loudly and hope the problem goes away

None of the three make me happy, and I'm wondering what people suggest I (and other package developers wanting to tap into ggplot2) should do. Thanks to all in advance. I really appreciate your even reading through this :-)

7
I like #1 and #3.Ben Bolker
@BenBolker those are my go-to techniques too.hadley
There is a 4th option: modify 'R CMD check' and submit a patch to r-devel for consideration. I suspect you'll find it's quite difficult (and possibly impossible) to detect which are spurious and which aren't. If anyone came up with a piece of code to do that, then ...Matt Dowle
Another strategy is to use aes_stringhadley
This seems to be a problem with transform and subset too (not 100% sure, but it makes sense).BrodieG

7 Answers

46
votes

Have you tried with aes_string instead of aes? This should work, although I haven't tried it:

aes_string(x = 'x.values', y = 'y.values')
89
votes

You have two solutions:

  • Rewrite your code to avoid non-standard evaluation. For ggplot2, this means using aes_string() instead of aes() (as described by Harlan)

  • Add a call to globalVariables(c("x.values", "y.values")) somewhere in the top-level of your package.

You should strive for 0 NOTES in your package when submitting to CRAN, even if you have to do something slightly hacky. This makes life easier for CRAN, and easier for you.

(Updated 2014-12-31 to reflect my latest thoughts on this)

31
votes

This question has been asked and answered a while ago but just for your information, since version 2.1.0 there is another way to get around the notes: aes_(x=~x.values,y=~y.values).

14
votes

In 2019, the best way to get around this is to use the .data prefix from the rlang package. This tells R to treat x.values and y.values as columns in a data.frame (so it won't complain about undefined variables).

Note: This works best if you have predefined columns names that you know will exist in you data input

#' @importFrom rlang .data
my_func <- function(data) {
    ggplot(data, aes(x = .data$x, y = .data$y))
}
13
votes

If

getRversion() >= "3.1.0"

You can add a call at the top level of the package:

utils::suppressForeignCheck(c("x.values", "y.values"))

from:

help("suppressForeignCheck")
7
votes

Add this line of code to the file in which you provide package-level documentation:

if(getRversion() >= "2.15.1")  utils::globalVariables(c("."))

Example here

0
votes

how about using get()?

geom_point(
         aes(
           x = get('x.values'), 
           y = get('y.values')
         ),
         data     = data,
         position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
)