1
votes

As part of a larger function to only retain values in a time series of plant growth which occur before an injury for each individual (plantid), I'm writing 2 chunks, which, in order, will contain a function

  1. Control that all variables given in an argument are character vectors (as in the second function, %in% doesn't recognised the named factors), and if not, convert to a character while providing a warning.

  2. Identify and mark rows from the above given variables which include one of the strings from argument b.

I'm quite sure I'm getting something wrong with the quotation/quasiquotation or bang-bang (!!)/big-bang (!!!) operators (this is my first time writing a function with quotation). I'm consistently given the “!!! may not be used at top-level” warnings, or the like, which I'm not sure how to solve. I also need help finding a good way to try to convert the variables which aren't characters.

This is what I've got so far

Argument description

  • df: data.frame

  • plantid: unique identifier for each individual plant

  • year: year of observation

  • injuries: list of (in my case) 3 columns which can contain an injury code, e.g. c("PrimaryInjury", "SecondaryInjury", "OtherInjury")

  • forbidden_values: the injury codes of interest, e.g. c("Rust", "Insect", "Snow break")

Function

id_injured <- function(df, plantid, year, injuries, forbidden_values){
    #parsing unquoted strings.
    plantid <- enquo(plantid)
    year <- enquo(year)
    forbidden_values <- enquos(forbidden_values)
    injuries <- syms(injuries)

    #if all variables in injuries are not characters, stop and warn (attempt to convert to character those variables which are not character)

    if(!all(purrr::pmap_int(select(df, !!!injuries), ~is.character(...))))){
       stop("All injury variables are not characters. Convert factors in injuries to character variables")} else {
          (1) #Control to give output while testing function, replace with conversion and warning?
    }

    #Identify rows with matching injury codes with 1, else 0.
    Dataplantid <- df %>% mutate(is_injured = purrr::pmap_int(select(df, !!!injuries), any(c(...) %in% !!!forbidden values)))

    #End of function
}


Intended use

I've removed part (1) of the function so that it will only try to mark 1 or 0.

Dataplantid <- id_injured(df=df, plantid=plantid, year=year, injuries=c("PrimaryInjury","SecondaryInjury","OtherInjury"),forbidden_values=c("Rust","Insect","Snow break")

Result

Error: Can't use !!! at top level.

> last_trace()
<error/rlang_error>
Can't use `!!!` at top level.
Backtrace:
     █
  1. └─global::so_injured(...)
  2.   └─`%>%`(...)
  3.     ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  4.     └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
  5.       └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
  6.         └─`_fseq`(`_lhs`)
  7.           └─magrittr::freduce(value, `_function_list`)
  8.             ├─base::withVisible(function_list[[k]](value))
  9.             └─function_list[[k]](value)
 10.               ├─dplyr::mutate(...)
 11.               └─dplyr:::mutate.data.frame(...)
 12.                 ├─base::as.data.frame(mutate(tbl_df(.data), ...))
 13.                 ├─dplyr::mutate(tbl_df(.data), ...)
 14.                 └─dplyr:::mutate.tbl_df(tbl_df(.data), ...)
 15.                   └─rlang::enquos(..., .named = TRUE)
 16.                     └─rlang:::endots(...)
 17.                       └─rlang:::map(...)
 18.                         └─base::lapply(.x, .f, ...)
 19.                           └─rlang:::FUN(X[[i]], ...)
 20.                             └─rlang::splice(...)

Associated data

plantid <- rep(c(1,2,3,4,5), times=c(3,3,3,3,3))
year <- rep(1:3, length.out=length(plantid))
set.seed(42)
PrimaryInjury <- sample(c(NA,NA,NA,"Rust","Insect", "Snow break"), 15, replace=TRUE)
SecondaryInjury <- rep(NA, length.out=length(plantid)) #Filled with NA for example
OtherInjury <- rep(NA, length.out=length(plantid)) #Filled NA for example
df <- data.frame(plantid,year,PrimaryInjury,SecondaryInjury,OtherInjury)
#Right now, PrimaryInjury is a factor, SecondaryInjury and OtherInjury are logical.

Expected output

Dataplantid <- df
Dataplantid$is_injured <- c(0,1,0,0,0,1,0,0,0,1,0,1,1,1,0)
1
First of all, your function misses a return argument. The new dataframe Dataplantid will not be saved in the global environment. Is your goal that the function returns a dataframe, which marks whether a certain plant had a certain injury? Then you only need a mutate() statement with ifelse(). Also, the problem is not reporducible since you didn't show how you used the function. - MKR
@MKR, thanks for the response! I'll update the question to reflect how the function is used. The Dataplantid returned will be returned but not saved, since it's the last object in the function environment, so that I can assign the output to a name of my choice, yes? In the larger function Dataplantid will be processed to group by plantid and arrange by year, and then mark all observations preceding the first injury with a 0, else 1, after which it'll be joined to the original dataframe - so that I can join the original dataframe with a new variable, "Before first injury". - Silviculturalist
@MKR Every function in R returns a value (not an “argument”!), including this one, though the unnecessary assignment inside the function is certainly misleading. Either way, this part of the code works. - Konrad Rudolph
@KonradRudolph could you share what you have? I can't get it to work. - Silviculturalist
@KonradRudolph my result is Error in c(...) %in% list(~c("Rust", "Insect", "Snow break")) : '...' used in an incorrect context - Silviculturalist

1 Answers

1
votes

There are a few problems, in order from least to most problematic:

  1. Use map_lgl instead of map_int for logical results.
  2. In particular, use map_lgl instead of pmap_int unless you actually intend to map across multiple arguments in parallel, which is not the case here.
  3. Do not assign the function result to a variable inside the function. It doesn’t really harm but it’s unnecessary and misleading.
  4. Do not enquote and then interpolate the forbidden_values values. You want to use a character vector here, not R names.
  5. You were missing a ~ in the purrr call to calculate is_injured.
  6. The logic to identify the injured values does not quite work like this; there may be a way of using pmap_lgl here but I think it’s more straightforward — albeit possibly more verbose — to reshape your data into long format, and work with that.

Put together, we get:

id_injured <- function(df, plantid, year, injuries, forbidden_values) {
    plantid <- enquo(plantid)
    year <- enquo(year)
    injuries <- syms(injuries)

    df_injuries <- select(df, !!! injuries)

    if (! all(purrr::map_lgl(df_injuries, is.character))) {
        stop("All injury variables are not characters. Convert factors in injuries to character variables")
    }

    is_injured <- df_injuries %>%
        mutate(.RowID = row_number()) %>%
        tidyr::gather(Key, Value, -.RowID) %>%
        group_by(.RowID) %>%
        summarize(is_injured = any(Value %in% forbidden_values)) %>%
        pull(is_injured)

    df %>% mutate(is_injured = is_injured)
}