194
votes

I am interested in what is the "correct" way to write functions with optional arguments in R. Over time, I stumbled upon a few pieces of code that take a different route here, and I couldn't find a proper (official) position on this topic.

Up until now, I have written optional arguments like this:

fooBar <- function(x,y=NULL){
  if(!is.null(y)) x <- x+y
  return(x)
}
fooBar(3) # 3
fooBar(3,1.5) # 4.5

The function simply returns its argument if only x is supplied. It uses a default NULL value for the second argument and if that argument happens to be not NULL, then the function adds the two numbers.

Alternatively, one could write the function like this (where the second argument needs to be specified by name, but one could also unlist(z) or define z <- sum(...) instead):

fooBar <- function(x,...){
  z <- list(...)
  if(!is.null(z$y)) x <- x+z$y
  return(x)
}
fooBar(3) # 3
fooBar(3,y=1.5) # 4.5

Personally I prefer the first version. However, I can see good and bad with both. The first version is a little less prone to error, but the second one could be used to incorporate an arbitrary number of optionals.

Is there a "correct" way to specify optional arguments in R? So far, I have settled on the first approach, but both can occasionally feel a bit "hacky".

7
Check out the source code for xy.coords to see a commonly used approach.Carl Witthoft
The source code for xy.coords mentioned by Carl Witthoftl can be found at xy.coordsRubenLaguna

7 Answers

154
votes

You could also use missing() to test whether or not the argument y was supplied:

fooBar <- function(x,y){
    if(missing(y)) {
        x
    } else {
        x + y
    }
}

fooBar(3,1.5)
# [1] 4.5
fooBar(3)
# [1] 3
65
votes

To be honest I like the OP's first way of actually starting it with a NULL value and then checking it with is.null (primarily because it is very simply and easy to understand). It maybe depends on the way people are used to coding but the Hadley seems to support the is.null way too:

From Hadley's book "Advanced-R" Chapter 6, Functions, p.84 (for the online version check here):

You can determine if an argument was supplied or not with the missing() function.

i <- function(a, b) {
  c(missing(a), missing(b))
}
i()
#> [1] TRUE TRUE
i(a = 1)
#> [1] FALSE  TRUE
i(b = 2)
#> [1]  TRUE FALSE
i(1, 2)
#> [1] FALSE FALSE

Sometimes you want to add a non-trivial default value, which might take several lines of code to compute. Instead of inserting that code in the function definition, you could use missing() to conditionally compute it if needed. However, this makes it hard to know which arguments are required and which are optional without carefully reading the documentation. Instead, I usually set the default value to NULL and use is.null() to check if the argument was supplied.

26
votes

These are my rules of thumb:

If default values can be calculated from other parameters, use default expressions as in:

fun <- function(x,levels=levels(x)){
    blah blah blah
}

if otherwise using missing

fun <- function(x,levels){
    if(missing(levels)){
        [calculate levels here]
    }
    blah blah blah
}

In the rare case that you thing a user may want to specify a default value that lasts an entire R session, use getOption

fun <- function(x,y=getOption('fun.y','initialDefault')){# or getOption('pkg.fun.y',defaultValue)
    blah blah blah
}

If some parameters apply depending on the class of the first argument, use an S3 generic:

fun <- function(...)
    UseMethod(...)


fun.character <- function(x,y,z){# y and z only apply when x is character
   blah blah blah 
}

fun.numeric <- function(x,a,b){# a and b only apply when x is numeric
   blah blah blah 
}

fun.default <- function(x,m,n){# otherwise arguments m and n apply
   blah blah blah 
}

Use ... only when you are passing additional parameters on to another function

cat0 <- function(...)
    cat(...,sep = '')

Finally, if you do choose the use ... without passing the dots onto another function, warn the user that your function is ignoring any unused parameters since it can be very confusing otherwise:

fun <- (x,...){
    params <- list(...)
    optionalParamNames <- letters
    unusedParams <- setdiff(names(params),optionalParamNames)
    if(length(unusedParams))
        stop('unused parameters',paste(unusedParams,collapse = ', '))
   blah blah blah 
}
10
votes

There are several options and none of them are the official correct way and none of them are really incorrect, though they can convey different information to the computer and to others reading your code.

For the given example I think the clearest option would be to supply an identity default value, in this case do something like:

fooBar <- function(x, y=0) {
  x + y
}

This is the shortest of the options shown so far and shortness can help readability (and sometimes even speed in execution). It is clear that what is being returned is the sum of x and y and you can see that y is not given a value that it will be 0 which when added to x will just result in x. Obviously if something more complicated than addition is used then a different identity value will be needed (if one exists).

One thing I really like about this approach is that it is clear what the default value is when using the args function, or even looking at the help file (you don't need to scroll down to the details, it is right there in the usage).

The drawback to this method is when the default value is complex (requiring multiple lines of code), then it would probably reduce readability to try to put all that into the default value and the missing or NULL approaches become much more reasonable.

Some of the other differences between the methods will appear when the parameter is being passed down to another function, or when using the match.call or sys.call functions.

So I guess the "correct" method depends on what you plan to do with that particular argument and what information you want to convey to readers of your code.

9
votes

Just wanted to point out that the built-in sink function has good examples of different ways to set arguments in a function:

> sink
function (file = NULL, append = FALSE, type = c("output", "message"),
    split = FALSE)
{
    type <- match.arg(type)
    if (type == "message") {
        if (is.null(file))
            file <- stderr()
        else if (!inherits(file, "connection") || !isOpen(file))
            stop("'file' must be NULL or an already open connection")
        if (split)
            stop("cannot split the message connection")
        .Internal(sink(file, FALSE, TRUE, FALSE))
    }
    else {
        closeOnExit <- FALSE
        if (is.null(file))
            file <- -1L
        else if (is.character(file)) {
            file <- file(file, ifelse(append, "a", "w"))
            closeOnExit <- TRUE
        }
        else if (!inherits(file, "connection"))
            stop("'file' must be NULL, a connection or a character string")
        .Internal(sink(file, closeOnExit, FALSE, split))
    }
}
7
votes

I would tend to prefer using NULL for the clarity of what is required and what is optional. One word of warning about using default values that depend on other arguments, as suggested by Jthorpe. The value is not set when the function is called, but when the argument is first referenced! For instance:

foo <- function(x,y=length(x)){
    x <- x[1:10]
    print(y)
}
foo(1:20) 
#[1] 10

On the other hand, if you reference y before changing x:

foo <- function(x,y=length(x)){
    print(y)
    x <- x[1:10]
}
foo(1:20) 
#[1] 20

This is a bit dangerous, because it makes it hard to keep track of what "y" is being initialized as if it's not called early on in the function.

2
votes

how about this?

fun <- function(x, ...){
  y=NULL
  parms=list(...)
  for (name in names(parms) ) {
    assign(name, parms[[name]])
  }
  print(is.null(y))
}

Then try:

> fun(1,y=4)
[1] FALSE
> fun(1)
[1] TRUE