5
votes

Consider a vector:

int = c(1, 1, 0, 5, 2, 0, 0, 2)

I'd like to get the closest subsequent index (not the difference) for a specified value. The first parameter of the function should be the vector, while the second should be the value one wants to see the closest subsequent elements.

For instance,

f(int, 0)
# [1] 2 1 0 2 1 0 0 NA

Here, the first element of the vector (1) is two positions away from the first subsequent 0, (3 - 1 = 2), so it should return 2. Then the second element is 1 position away from a 0 (2 - 1 = 1). When there is no subsequent values that match the specified value, return NA (here it's the case for the last element, because no subsequent value is 0).

Other examples:

f(int, 1)
# [1] 0 0 NA NA NA NA NA NA

f(int, 2) 
# [1] 4 3 2 1 0 2 1 0

f(int, 3) 
# [1] NA NA NA NA NA NA NA NA

This should also work for character vectors:

char = c("A", "B", "C", "A", "A")

f(char, "A") 
# [1] 0 2 1 0 0
5
In your last char example, could you explain the output, why are we getting 0,2,1,0,0 ? - zx8754
Yes. f should return the closest following value that equals "A". So first the value of char, "A", this is 0, for the second, it is 2, because B is 2-position away from the following "A" (4-2 = 2). Does it make more sense now? - Maël

5 Answers

4
votes

Look for the match from nth position to the end of the vector, then get the 1st match:

f <- function(v, x){
  sapply(seq_along(v), function(i){
    which(v[ i:length(v) ] == x)[ 1 ] - 1
  })
}

f(int, 0)
# [1]  2  1  0  2  1  0  0 NA
f(int, 1)
# [1]  0  0 NA NA NA NA NA NA
f(int, 2)
# [1] 4 3 2 1 0 2 1 0
f(int, 3) 
# [1] NA NA NA NA NA NA NA NA

f(char, "A") 
# [1] 0 2 1 0 0
3
votes

Here f is defined as a recursive function that calls itself over shorter tails of the lookup vector:

f <- function(lookup,val ) {
  ind <- which(lookup == val)[1] -1
  if (length(lookup) > 1) {
    c(ind, f(lookup[-1], val))
  } else {
    ind
  }
}
2
votes

Find the location of each value (numeric or character)

int = c(1, 1, 0, 5, 2, 0, 0, 2)
value = 0
idx = which(int == value)
## [1] 3 6 7

Expand the index to indicate the nearest value of interest, using an NA after the last value in int.

nearest = rep(NA, length(int))
nearest[1:max(idx)] = rep(idx, diff(c(0, idx))),
## [1]  3  3  3  6  6  6  7 NA

Use simple arithmetic to find the difference between the index of the current value and the index of the nearest value

abs(seq_along(int) - nearest)
## [1]  2  1  0  2  1  0  0 NA

Written as a function

f <- function(x, value) {
    idx = which(x == value)
    nearest = rep(NA, length(x))
    nearest[1:max(idx)] = rep(idx, diff(c(0, idx)))
    abs(seq_along(x) - nearest)
}

We have

> f(int, 0)
[1]  2  1  0  2  1  0  0 NA
> f(int, 1)
[1]  0  0 NA NA NA NA NA NA
> f(int, 2)
[1] 4 3 2 1 0 2 1 0
> f(char, "A")
[1] 0 2 1 0 0
> f(char, "B")
[1]  1  0 NA NA NA
> f(char, "C")
[1]  2  1  0 NA NA

The solution doesn't involve recursion or R-level loops, so should e fast even for long vectors.

1
votes

Here is an approach using Reduce() and then some fiddling to get the NA values.

f <- function(vec, value) {
replace(
  Reduce(
    function(x, y)
      x  + (y * x) ,
    vec != value,
    right = TRUE,
    accumulate = TRUE
  ),
  max(tail(which(vec == value), 1), 0) < seq_along(vec),
  NA
)
}

f(int, 0)          
[1]  2  1  0  2  1  0  0 NA

f(int, 1)          
[1]  0  0 NA NA NA NA NA NA

f(int, 2) 
[1] 4 3 2 1 0 2 1 0

f(int, 3) 
[1] NA NA NA NA NA NA NA NA

char = c("A", "B", "C", "A", "A")

f(char, "A") 
[1] 0 2 1 0 0
1
votes

Another possible solution, based on purrr::map2_dbl:

library(purrr)

int = c(1, 1, 0, 5, 2, 0, 0, 2)

f <- function(int, num)
{
  n <- length(int)
  
  map2_dbl(num, 1:n, ~ ifelse(length(which(.x == int[.y:n])) == 0, NA, 
      min(which(.x == int[.y:n])) - 1))
}

f(int, 0)
#> [1]  2  1  0  2  1  0  0 NA

f(int, 1)          
#> [1]  0  0 NA NA NA NA NA NA

f(int, 2) 
#> [1] 4 3 2 1 0 2 1 0

f(int, 3) 
#> [1] NA NA NA NA NA NA NA NA

char = c("A", "B", "C", "A", "A")

f(char, "A") 
#> [1] 0 2 1 0 0