2
votes

I am looking for a purrr solution to accomplish what I have done with a for loop. I suspect it will be a version of map2 or pmap. I've looked at the documentation and it makes sense for the simple cases provided, but I haven't been able to figure out how to use this for my case.

In the data frame below, the "Sum" columns contain monthly forecast for a given SKU. The "End of" columns contain the month-end date for the given month. The rule I am applying is that if the year (last two digits in the End of columns) is greater than 22, I replace the corresponding values in the Sum column for the given month/SKU with 0. The code I currently use (which works) is below.

sample.data.frame <- data.frame(SKU = c('ABC', 'DEF', 'GHI'), 
'Sum Month 01' = c(3, 9, 8), 'Sum Month 02' = c(5, 10, 4), 
'Sum Month 03' = c(9, 6, 5), 'Sum Month 04' = c(5, 9, 10), 
'End of Month 01' = rep('10/28/22', 3), 
'End of Month 02' = rep('12/02/22', 3), 
'End of Month 03' = rep('12/30/22', 3), 
'End of Month 04' = rep('01/27/23', 3), check.names = F)

for (i in seq_along(sample.data.frame[grepl("Sum Month", colnames(sample.data.frame))])){
    if (substr(sample.data.frame[grepl("End of", colnames(sample.data.frame))][[i]][[1]], 7, 8) > 22){
      sample.data.frame[grepl("Sum Month", colnames(sample.data.frame))][[i]] <- 0
    }
  
}

Additionally, I have written the following function based on the code above which works on a vector of dates and a vector of numbers. I can't figure out how to make it work over multiple vectors of dates and numbers (e.g., the various columns of the dataframe).

data.replace.fcn <- function(date.col, number.col){
  for (i in seq_along(date.col)) {
  if (substr(date.col[i], 7, 8) > 22) {
    number.col[i] <- 0
  } else {
    number.col[i] <- number.col[i]
     
  }
    
  }
  return(number.col)
}

Thanks in advance for your help.

3

3 Answers

2
votes

as an alternative to purrr, you could try to tranpose your data first into a tidy format? Then it would be just an if_else statement to get what you want.

library(tidyverse)

# using your input data.frame
sample.data.frame %>% 
  
  pivot_longer(
    cols = -SKU,
    names_to = c(".value", "Month"),
    names_pattern = "(.*) (\\d\\d)",
    names_transform = list(Month = readr::parse_integer),
    values_transform = list(`End of Month` = lubridate::mdy, `Sum Month` = as.integer)
  ) %>% 
  
  mutate(`Sum Month` = if_else(lubridate::year(`End of Month`) > 2022, 0L, `Sum Month`))
2
votes

You can use map2 from purrr here operating on "Sum Month" and "End of" column in pairs.

library(dplyr)
library(purrr)

map2_df(select(sample.data.frame, starts_with('Sum Month')), 
        select(sample.data.frame, starts_with('End of')), 
        ~if_else(substr(.y, 7, 8) > 22, 0, .x))

#   `Sum Month 01` `Sum Month 02` `Sum Month 03` `Sum Month 04`
#           <dbl>          <dbl>          <dbl>          <dbl>
#1              3              5              9              0
#2              9             10              6              0
#3              8              4              5              0

The same can be written in base R, using mapply/Map :

sum_month <- grep('Sum Month', names(sample.data.frame))
end_month <- grep('End of', names(sample.data.frame))

sample.data.frame[sum_month] <- Map(function(x, y) 
               ifelse(substr(y, 7, 8) > 22, 0, x), 
                sample.data.frame[sum_month], sample.data.frame[end_month])
1
votes

The easiest way, in my opinion, is to transform the data into a tidy format (each variable is a column, each observation is a row), for which most functions in the tidyverse are designed:

library(tidyverse)
sample.data.frame_long <- sample.data.frame %>% 
  pivot_longer(cols = c(starts_with("Sum Month"), starts_with("End of Month")),
               names_to = c(".value", "Month"),
               names_pattern =  "(\\w+).*?(\\d+)")
sample.data.frame_long
#> # A tibble: 12 x 4
#>    SKU   Month   Sum End     
#>    <chr> <chr> <dbl> <chr>   
#>  1 ABC   01        3 10/28/22
#>  2 ABC   02        5 12/02/22
#>  3 ABC   03        9 12/30/22
#>  4 ABC   04        5 01/27/23
#>  5 DEF   01        9 10/28/22
#>  6 DEF   02       10 12/02/22
#>  7 DEF   03        6 12/30/22
#>  8 DEF   04        9 01/27/23
#>  9 GHI   01        8 10/28/22
#> 10 GHI   02        4 12/02/22
#> 11 GHI   03        5 12/30/22
#> 12 GHI   04       10 01/27/23

Once this is done it is a lot easier to compare the two variables:

sample.data.frame_long %>% 
  mutate(Sum = ifelse(
    as.numeric(str_extract(End, "\\d+$")) > 22, 0, Sum))
#> # A tibble: 12 x 4
#>    SKU   Month   Sum End     
#>    <chr> <chr> <dbl> <chr>   
#>  1 ABC   01        3 10/28/22
#>  2 ABC   02        5 12/02/22
#>  3 ABC   03        9 12/30/22
#>  4 ABC   04        0 01/27/23
#>  5 DEF   01        9 10/28/22
#>  6 DEF   02       10 12/02/22
#>  7 DEF   03        6 12/30/22
#>  8 DEF   04        0 01/27/23
#>  9 GHI   01        8 10/28/22
#> 10 GHI   02        4 12/02/22
#> 11 GHI   03        5 12/30/22
#> 12 GHI   04        0 01/27/23

Or converting to a proper date first:

sample.data.frame_long%>% 
  mutate(End2 = lubridate::mdy(End)) %>% 
  mutate(Sum = ifelse(End2 > "2022-12-31", 0, Sum))
#> # A tibble: 12 x 5
#>    SKU   Month   Sum End      End2      
#>    <chr> <chr> <dbl> <chr>    <date>    
#>  1 ABC   01        3 10/28/22 2022-10-28
#>  2 ABC   02        5 12/02/22 2022-12-02
#>  3 ABC   03        9 12/30/22 2022-12-30
#>  4 ABC   04        0 01/27/23 2023-01-27
#>  5 DEF   01        9 10/28/22 2022-10-28
#>  6 DEF   02       10 12/02/22 2022-12-02
#>  7 DEF   03        6 12/30/22 2022-12-30
#>  8 DEF   04        0 01/27/23 2023-01-27
#>  9 GHI   01        8 10/28/22 2022-10-28
#> 10 GHI   02        4 12/02/22 2022-12-02
#> 11 GHI   03        5 12/30/22 2022-12-30
#> 12 GHI   04        0 01/27/23 2023-01-27