0
votes

I need to perform some calculations on the columns of a tibble. I'm using mutate(across()) but I need to be able to pass the column names too. I have the following test data:

mode <- c('PLDV','PLDV','PLDV')
var <- c('PMT','PMT','PMT')
city <- c('City1','City2','City3')
y2015 <- c(1000,2000,3000)
y2020 <- c(1500,2500,3500)
fuel <- c('SI','SI','SI')
scenario <- c('BAU','BAU','BAU')

test1 <- tibble(mode, var, city, y2015, y2020)
test2 <- tibble(scenario, mode, fuel, y2015, y2020)

yrs = c("y2015","y2020")

The function is:

si_calc <- function(x, na.rm=FALSE)(
  pull(test1 %>% filter(mode=="PLDV",var=="PMT") %>%
         select(x) / 1000
  )
)

And the function call is:

test2 %>% filter(scenario=="BAU", mode=="PLDV", fuel=="SI") %>%
  mutate(across(yrs,si_calc))

I know that x are the values of the column, but I need to also pass the column name. It seemed to work earlier using mutate_at(), but I upgraded my dplyr version and it doesn't seem to be working the same way. The pull() is because when I had it semi-working before I needed to convert the returned data type to a vector so it would properly apply to multiple rows at once.

1
Try adding this mutate(across(yrs,~si_calc(.))) some columns are not present in your data! - Duck
Why are you applying to test2 a function that start pulling a column from test1? I don't get what you are trying to do or what is your expected output - Ric S
@Duck following your suggestion gives me the error: x Can't subset columns that don't exist. x Locations 1000, 2000, and 3000 don't exist. - Jason Hawkins
Your data does not have a column that is used in your function! - Duck
@RicS test1 contains data necessary to update test2. I can't merge them because the data varies between rows and I need to perform different operations on different rows of test1/test2. I'm a Python person but the work has to be done in R, so I'm sure there's a different way to do it. - Jason Hawkins

1 Answers

0
votes

Thank you to @Duck for the suggestion of mutate(across(yrs,~si_calc(.))). dplyr also has context dependent expressions that gave me what I was looking for (https://dplyr.tidyverse.org/reference/context.html). Using cur_column() in the across() function gives the value of the current column name. Solution is:

si_calc <- function(x, na.rm=FALSE)(
  pull(test1 %>% filter(mode=="PLDV",var=="PMT") %>%
         select(curr_column()) / 1000
  )
)

test2 %>% filter(scenario=="BAU", mode=="PLDV", fuel=="SI") %>%
  mutate(across(yrs,si_calc))