I have a function that I want to iterate over only certain rows of my dataset, and then save the results in a variable in the dataset.
So for example say I have this set up:
library(tidyverse)
add_one <- function(vector, x_id){
return(vector[x_id] + 1)
}
test <- data.frame(x = c(1,2,3,4), y = c(1,2,3,4), run_on = c(TRUE,FALSE,TRUE,FALSE))
test
So the test data frame looks like:
> x y run_on
>1 1 1 TRUE
>2 2 2 FALSE
>3 3 3 TRUE
>4 4 4 FALSE
So what I want to do is iterate over the dataframe and set the y column to be the result of applying the function add_one() to the x column for just the rows where run_on is TRUE. I want the end result to look like this:
> x y run_on
>1 1 2 TRUE
>2 2 2 FALSE
>3 3 4 TRUE
>4 4 4 FALSE
I have been able to iterate the function over all of the rows using apply(). So for example:
test$y <- apply(test,1,add_one,x_id = 1)
test
> x y run_on
>1 1 2 TRUE
>2 2 3 FALSE
>3 3 4 TRUE
>4 4 5 FALSE
But this also applies the function to rows 2 and 4, which I do not want. I suspect there may be some way to do this using versions of the map() functions from ::purrr, which is why I tagged this post as such.
In reality, I am using this kind of procedure to repeatedly iterate over a large dataset multiple times, so I need it to be done automatically and cleanly. Any help or suggestions would be very much appreciated.
UPDATE
I managed to find a solution. Some of the solutions offered here did work in my toy example but did not extend to the more complex function I was actually using. Ultimately what worked was something similar to what tmfmnk suggested. I just wrapped the original function inside another function that included an if statement to determine whether or not to apply the original function. So to extend my toy example, my solution looks like this:
add_one_if <- function(vector, x_id, y_id, run_on_id){
if(vector[run_on_id]){
return(add_one(vector,x_id))}
else{
return(vector[x_id])
}
}
test$y <- apply(test, 1, add_one_if, x_id = 1, y_id = 2, run_on_id = 3)
It seems a little convoluted, but it worked for me and is reproducible and reliable in the way I need it to be.
with(test, y + run_on)
;) – markus