1
votes

I have a function that I want to iterate over only certain rows of my dataset, and then save the results in a variable in the dataset.

So for example say I have this set up:

library(tidyverse)

add_one <- function(vector, x_id){
  return(vector[x_id] + 1)
}

test <- data.frame(x = c(1,2,3,4), y = c(1,2,3,4), run_on = c(TRUE,FALSE,TRUE,FALSE))
test

So the test data frame looks like:

>  x y run_on
>1 1 1   TRUE
>2 2 2  FALSE
>3 3 3   TRUE
>4 4 4  FALSE

So what I want to do is iterate over the dataframe and set the y column to be the result of applying the function add_one() to the x column for just the rows where run_on is TRUE. I want the end result to look like this:

>  x y run_on
>1 1 2   TRUE
>2 2 2  FALSE
>3 3 4   TRUE
>4 4 4  FALSE

I have been able to iterate the function over all of the rows using apply(). So for example:

test$y <- apply(test,1,add_one,x_id = 1)
test

>  x y run_on
>1 1 2   TRUE
>2 2 3  FALSE
>3 3 4   TRUE
>4 4 5  FALSE

But this also applies the function to rows 2 and 4, which I do not want. I suspect there may be some way to do this using versions of the map() functions from ::purrr, which is why I tagged this post as such.

In reality, I am using this kind of procedure to repeatedly iterate over a large dataset multiple times, so I need it to be done automatically and cleanly. Any help or suggestions would be very much appreciated.

UPDATE

I managed to find a solution. Some of the solutions offered here did work in my toy example but did not extend to the more complex function I was actually using. Ultimately what worked was something similar to what tmfmnk suggested. I just wrapped the original function inside another function that included an if statement to determine whether or not to apply the original function. So to extend my toy example, my solution looks like this:

add_one_if <- function(vector, x_id, y_id, run_on_id){
    if(vector[run_on_id]){
        return(add_one(vector,x_id))}
    else{
        return(vector[x_id])
    }
}

test$y <- apply(test, 1, add_one_if, x_id = 1, y_id = 2, run_on_id = 3)

It seems a little convoluted, but it worked for me and is reproducible and reliable in the way I need it to be.

3
with(test, y + run_on) ;)markus
This is a clever solution if jcr is really only about adding one to a column. However, it sounds like this is just an example, in which case this won't generalize to other functions.natej

3 Answers

2
votes

You can also do:

add_one <- function(data, vector, x_id, n, is.true = c(TRUE, FALSE)) {  
 if (is.true) {
  return(data[[vector]] + (data[[x_id]]) * n)
 } else {
  return(data[[vector]] + (!data[[x_id]]) * n)
 }
}   

add_one(test, vector = "y", x_id = "run_on", 1, is.true = TRUE)

[1] 2 2 4 4

add_one(test, vector = "y", x_id = "run_on", 5, is.true = FALSE)

[1] 1 7 3 9
2
votes

It may be that your real case is more complicated than allowed by this, but why not just use ifelse?

test$y <- ifelse(test$run_on,add_one(test,x),y)

Or even:

test$y[test$run_on]<-add_one(test[run_on,],x)
1
votes

You won't need to use purrr until you are applying the same function to multiple columns. Since you want to modify only one column, but based on a condition you can use mutate() + case_when().

mutate(test, y = case_when(run_on ~ add_one(y),
                           !run_on ~ y))
#>   x y run_on
#> 1 1 2   TRUE
#> 2 2 2  FALSE
#> 3 3 4   TRUE
#> 4 4 4  FALSE