I am trying to submit a package to CRAN. My function was pretty long, several thousand lines long. I rewrote it and broke it up into a wrapper ("outside") function which calls a set of "inside" sub-functions (not exported) which create objects that I want to return to the wrapper function environment. I have tried either using the assign() function or list2env(), which does the same thing except it takes a list as argument and returns objects named as their named elements in the list. When I run R CMD check on my package, the "no visible binding for global variables" warning is triggered, because many variables are created in the sub-functions and returned to the environment from within these functions, and are used in the wrapper environment afterwards without an explicit instance of their creation in this environment.
I have seen questions raised about this before online. Some of them deal specifically with ggplot, dplyr, or with subsetting or data.frame issues. This is more general. Some online references mention using the utils::globalVariables function (https://github.com/r-lib/devtools/issues/1714) to first declare these variables that I will create later as global variables. The forums mention either putting these in a separate globals.R scrip, or in a function call at the beginning of my wrapper function. But this solution seems to be controversial as a "hack". Another solution (equally "hackish", but okay I suppose) is simply to initialize all these variables as NULL at the beginning of the code.
Another solution I have seen is to basically store all these objects as members of a list that is initialized in the wrapper function, and then to return all outputs of the sub-functions to append or modify the list items. In this way, the global objects I want to create are not separate objects individually, but are rather part of a list, so there is no problem. However, then I would need to singificantly rewrite my code to refer to every object as a list item (e.g., tmp$obj rather than just obj). On the other hand, this would be simpler in a way because all the objects are stored in a list that can be referred to and passed as a single unit, rather than having to keep track of them individually.
I would like to hear from people with experience about the various advantages/disadvantages or correctness of these approaches.
Returning objects to environment
outside_function <- function() {
k <- letters[17:23]
#inside_function creates objects m and z which did not exist before
inside_function()
ls()
print(m)
print(z)
inside_function()
ls()
#z and m should now be overwritten
print(m)
print(z)
}
inside_function <- function() {
m <- matrix(runif(4), ncol=2)
z <- letters[1:10]
#assign to the wrapping environment
assign("m", m, envir=parent.frame())
assign("z", z, envir=parent.frame())
#an equivalent way:
list2env(list(m=m, z=z), envir=parent.frame())
}
Alternative way, keeping objects as a list
outside_function <- function() {
k <- letters[17:23]
#inside_function creates objects m and z which did not exist before
tmp <- inside_function()
#refer to m and z only as items in tmp
print(tmp$m)
print(tmp$z)
tmp <- inside_function()
ls()
#z and m should now be overwritten
print(tmp$m)
print(tmp$z)
}
inside_function <- function() {
m <- matrix(runif(4), ncol=2)
z <- letters[1:10]
#return as list items
list(m=m, z=z)
}
For the first one, I get the following notes:
outside_function: no visible binding for global variable 'm'
outside_function: no visible binding for global variable 'z'