1
votes

I am trying to submit a package to CRAN. My function was pretty long, several thousand lines long. I rewrote it and broke it up into a wrapper ("outside") function which calls a set of "inside" sub-functions (not exported) which create objects that I want to return to the wrapper function environment. I have tried either using the assign() function or list2env(), which does the same thing except it takes a list as argument and returns objects named as their named elements in the list. When I run R CMD check on my package, the "no visible binding for global variables" warning is triggered, because many variables are created in the sub-functions and returned to the environment from within these functions, and are used in the wrapper environment afterwards without an explicit instance of their creation in this environment.

I have seen questions raised about this before online. Some of them deal specifically with ggplot, dplyr, or with subsetting or data.frame issues. This is more general. Some online references mention using the utils::globalVariables function (https://github.com/r-lib/devtools/issues/1714) to first declare these variables that I will create later as global variables. The forums mention either putting these in a separate globals.R scrip, or in a function call at the beginning of my wrapper function. But this solution seems to be controversial as a "hack". Another solution (equally "hackish", but okay I suppose) is simply to initialize all these variables as NULL at the beginning of the code.

Another solution I have seen is to basically store all these objects as members of a list that is initialized in the wrapper function, and then to return all outputs of the sub-functions to append or modify the list items. In this way, the global objects I want to create are not separate objects individually, but are rather part of a list, so there is no problem. However, then I would need to singificantly rewrite my code to refer to every object as a list item (e.g., tmp$obj rather than just obj). On the other hand, this would be simpler in a way because all the objects are stored in a list that can be referred to and passed as a single unit, rather than having to keep track of them individually.

I would like to hear from people with experience about the various advantages/disadvantages or correctness of these approaches.

Returning objects to environment

outside_function <- function() {
    k <- letters[17:23]
    #inside_function creates objects m and z which did not exist before               
    inside_function()
    ls()
    print(m)
    print(z)
    inside_function()
    ls()
    #z and m should now be overwritten
    print(m)
    print(z)
}

inside_function <- function() {
    m <- matrix(runif(4), ncol=2)
    z <- letters[1:10]

    #assign to the wrapping environment 
    assign("m", m, envir=parent.frame())
    assign("z", z, envir=parent.frame())
    #an equivalent way:
    list2env(list(m=m, z=z), envir=parent.frame())  

}

Alternative way, keeping objects as a list

outside_function <- function() {
    k <- letters[17:23]
    #inside_function creates objects m and z which did not exist before               
    tmp <- inside_function()

    #refer to m and z only as items in tmp
    print(tmp$m)
    print(tmp$z)

    tmp <- inside_function()
    ls()
    #z and m should now be overwritten
    print(tmp$m)
    print(tmp$z)
}

inside_function <- function() {
    m <- matrix(runif(4), ncol=2)
    z <- letters[1:10]

    #return as list items
    list(m=m, z=z)
}

For the first one, I get the following notes:

outside_function: no visible binding for global variable 'm'
outside_function: no visible binding for global variable 'z'
2
This is subjective but I would definitely argue for the list approach. It seems much cleaner to me to have functions that are self-contained and return specific values, rather than having them mess with "global state". ("global state" is not strictly true here but that's how it would be described in most languages).Marius
see my edits above. I figured out how to use the environmentsSam A.
Glad to hear you figured it out. You're allowed to answer your own question on Stack Overflow (I forget the exact rules but I think you can also "accept" it as the best answer after a short time), so feel free to post that as an answer.Marius

2 Answers

1
votes

I have had this problem with a package I built whose sole purpose is to assign variables to environments. I feel your pain.

My solution was to go with initializing the variables as NULL. Also, I wouldn't really call this hackish as plenty of programming languages (the simplest I can think of off the top of my head is visual basic) require you to initialize variables before they are used. Listing isn't a bad idea but as you say it requires a lot of re-factoring and probably isn't worth your time.

0
votes

SOLUTION USING ENVIRONMENTS

So I figured out how to do this. Yes, you can use the list approach, but it is somewhat artificial. Here is the proper way: define a named empty environment inside the wrapper function outside_function, to which all objects that you want to store (and return at the end) are written. This environment is then passed as a single argument (like a list) to the inside functions. Within inside_function, you can edit stored environment objects in real time, without having to explicitly return the objects in a list back to a list object. It is cleaner.

outside_function <- function() {
  
  myenv <- new.env(parent = emptyenv())
  #object k exists in local environment, but not myenv
  k <- LETTERS[17:23]
  #assign list of objects to 
  print(ls()) #two objects, k and myenv
  print(ls(myenv))

  print("first run")
  inside_function(env=myenv) 
  print("LS")
  print(as.list(myenv))
  print("second run")
  inside_function(env=myenv)
  print("LS")
  print(as.list(myenv))

  #inside here, have to refer to objects as list elements
  #the command print(m) searches through environments to find an object
  #if nothing exists locally, m will find myenv$m, but is misleading
  #try(print(m))  
  #now create a local object m that is different
  m <- "blah"
  print(m) #gives 'blah'
  print(myenv$m)
  
  #return at end as a list
  invisible(as.list(myenv))
 
}  
inside_function <- function(env) {
  #create/overwrite objects in env
  
  env$m <- matrix(stats::runif(4), ncol=2)
  #these are created in real time within inside_function without having
  #to return env (notice NULL is a returned value)
  print(env$m)
  #overwite
  env$m <- matrix(stats::runif(4), ncol=2)
  print(env$m)
  env$d <- 5
  print(env$d)
  env$d <- env$d + runif(1)
  env$z <- letters[sample(1:20, size=6)]
  invisible(NULL)
}

tmp <- outside_function()
print(tmp) #contains all the objects as a list