1
votes

Question solved!

Question:

In R, I've been trying to find an elegant way to apply several functions with different arguments to a list containing many tibbles/data.frames, however, I'm struggling to pass through the arguments correctly. I'm attempting to clean and pre-process text data in pharmaceuticals & I've been trying to use modify_if, invoke, map and more. Any help is greatly appreciated.

Note: only starting to learn programming, please forgive the naivety :)

# Set up Example Data 
Test_DataFrame <- tibble("Integer_Variable" = c(rep(x = 1:4))
             ,"Character_Variable" = c("tester to upper"
                          ,"test   squishing"
                          ,"canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
                          ,"         test white space triming      " ))

# With modify_if with a singular function and arguments it works: 
# Mofidy character vectors by trimming the left side of the string --= works well
modify_if(.x = Test_DataFrame
      ,.p = is.character
      ,.f = str_trim
      , side = "left") # Works well
# Expected results
# A tibble: 4 x 2
#   Integer_Variable Character_Variable                                          
#              <int> <chr>                                                       
# 1                1 "tester to upper"                                           
# 2                2 "test   squishing"                                          
# 3                3 "canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
# 4                4 "test white space triming      "   
####### Note the right hanging whitespace proving the arguments is being applied!

However, when I try doing this with with more than one function with any arguments I hit a wall (function arguments are ignored). I've tried a lot of combinations of modify_if (some below) and other functions such as invoke (buts its being retired), exec with map (which makes no sense to me). So far no success. Any help is grately appreciated.

# does not work
modify_if(.x = Test_DataFrame
      ,.p = is.character                # = the condition to specify which column to apply the functions to  
      ,.f = c(                      # a pairwise list of "name" = "function to apply" to apply to each column where the condition = TRUE
        UpperCase = str_to_upper        # Convert strings to upper case
        ,TrimLeadTailWhiteSpace = str_trim  # trim leading and ending whitespace
        ,ExcessWhiteSpaceRemover = str_squish)  # if you find any double or more whitespaces (eg "  " or "   ") then cut it down to " " 
      , side = "left"              # its ignoring these arguments.
    )

# Does not work
modify_if(.x = Test_DataFrame
      ,.p = is.character
      ,.f = c(UpperCase = list(str_to_upper)    # listed variant doesnt work
        ,TrimLeadTailWhiteSpace = list(str_trim, side = "left")
        ,ExcessWhiteSpaceRemover = list(str_squish))
    ) # returns the integer variable instead of the character so drastically wrong

# Set up Function - Argument Table
Function_ArgumentList <- tibble("upper" = list(str_to_upper)
                   ,"trim" = list(str_trim, side = "left")
                   ,"squish" = list(str_squish))

# Doesnt work
modify_if(.x = Test_DataFrame
      ,.p = is.character
      ,.f = Function_ArgumentList)
# Error: Can't convert a `tbl_df/tbl/data.frame` object to function
# Run `rlang::last_error()` to see where the error occurred.

I realise that the functions used in the above examples would be fine to pass through without arguments, but to solve the problem I'm having this is the simplied example of the problem I'm encountering.

Solution:

Thanks to @stefan and @BenNorris for the hel;p below! To @stefan 's solution more clearly, I've slightly modified the answer to;

library(dplyr)
library(purrr)
library(stringr)
Test_DataFrame <- tibble("Integer_Variable" = c(rep(x = 1:4))
                        ,"Character_Variable" = c("tester to upper"
                                                ,"test   squishing"
                                                ,"canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
                                                ,"         test white space triming      " )
                        )
f_help <- function(x, side = "left") {
                str_to_upper(x) %>% 
                str_trim(side = side) # %>% 
                # str_squish()                # note that this is commented out
                }

modify_if(.x = Test_DataFrame
        ,.p = is.character
        ,.f = f_help
        ,side = "left") 
# A tibble: 4 x 2
# Integer_Variable Character_Variable                                          
# <int> <chr>                                                       
# 1     "TESTER TO UPPER"                                           
# 2     "TEST   SQUISHING"                                          
# 3     "CANITCOMPREHEND?.,-0(`KLJNDSFIUHAWERAERIOU140987645=ERROR?"
# 4     "TEST WHITE SPACE TRIMING      " 
                              # Note the right sided white space is still preent! It worked!!!
2
No need for Question solved! at beginning of post as check mark below confirms resolution.Parfait

2 Answers

2
votes

As far as I get it there are two approaches to tackle this problem

  1. Make use of a helper function
  2. Make use of purrr::compose
library(dplyr)
library(purrr)
library(stringr)

Test_DataFrame <- tibble("Integer_Variable" = c(rep(x = 1:4))
                         ,"Character_Variable" = c("tester to upper"
                                                   ,"test   squishing"
                                                   ,"canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
                                                   ,"         test white space triming      " ))

f_help <- function(x, side = "left") {
  str_to_upper(x) %>% 
    str_trim(side = side) %>% 
    str_squish()
}

modify_if(.x = Test_DataFrame,
          .p = is.character,
          .f = f_help, side = "left"
)
#> # A tibble: 4 x 2
#>   Integer_Variable Character_Variable                                        
#>              <int> <chr>                                                     
#> 1                1 TESTER TO UPPER                                           
#> 2                2 TEST SQUISHING                                            
#> 3                3 CANITCOMPREHEND?.,-0(`KLJNDSFIUHAWERAERIOU140987645=ERROR?
#> 4                4 TEST WHITE SPACE TRIMING

modify_if(.x = Test_DataFrame,
          .p = is.character,
          .f = purrr::compose(str_to_upper, ~ str_trim(.x, side = "left"), str_squish)
)
#> # A tibble: 4 x 2
#>   Integer_Variable Character_Variable                                        
#>              <int> <chr>                                                     
#> 1                1 TESTER TO UPPER                                           
#> 2                2 TEST SQUISHING                                            
#> 3                3 CANITCOMPREHEND?.,-0(`KLJNDSFIUHAWERAERIOU140987645=ERROR?
#> 4                4 TEST WHITE SPACE TRIMING
1
votes

The .f argument of modify_if() expects (according to its help file):

A function, formula, or vector (not necessarily atomic).
If a function, it is used as is.

If a formula, e.g. ~ .x + 2, it is converted to a function. 
There are three ways to refer to the arguments:

    For a single argument function, use .
    For a two argument function, use .x and .y
    For more arguments, use ..1, ..2, ..3 etc

This syntax allows you to create very compact anonymous functions.

If character vector, numeric vector, or list, it is converted to an extractor function. 
Character vectors index by name and numeric vectors index by position; use a list to index 
by position and name at different levels. If a component is not present, the value of 
.default will be returned.

So, if you supply a vector or list, modify_if is trying to coerce your values to indices (and failing). You have two choices. First you can create your own custom function that does what you want:

custom_function < function(x) {
  str_squish(str_trim(str_to_upper(x), side = "left"))
}
modify_if(.x = Test_DataFrame, 
          .p = is.character,              
          .f = custom_function
          )

Or you can write the function as an anonymous function.

modify_if(.x = Test_DataFrame, 
          .p = is.character,              
          .f = function(x) {
                           str_squish(str_trim(str_to_upper(x), side = "left"))
                           }
          )