7
votes

I would like to return a new column in a data.table which shows how many rows down until a value lower than the current value (of Temp) is reached.

library(data.table)
set.seed(123)
DT <- data.table( Temp = runif(10,0,20) )

This is how I would like it to look:

set.seed(123)
DT <- data.table(
        Temp = runif(10,0,20),
        Day_Below_Temp = c("5","1","3","2","1","NA","3","1","1","NA")
)
3

3 Answers

4
votes

Using the newly implemented non-equi joins in the current development version, this can be accomplished in a straightforward manner as follows:

require(data.table) # v1.9.7+
DT[, row := .I] # add row numbers
DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first"]
# [1]  5  1  3  2  1 NA  3  1  1 NA

The row number is necessary since we need to find indices lower than the current index, hence needs to be a condition in the join. We perform a self-join, i.e., for each row in DT (inner), based on condition provided to on argument, we find the first matching row index in DT (outer). Then we subtract the row indices to get the position from the current row. x.row refers to the index of outer DT and i.row to the inner DT.

To get the devel version, see installation instructions here.


On 1e5 rows:

set.seed(123)
DT <- data.table(Temp = runif(1e5L, 0L, 20L))

DT[, row := .I]
system.time({
    ans = DT[DT, x.row-i.row, on = .(row > row, Temp < Temp), mult="first", verbose=TRUE]
})
# Non-equi join operators detected ... 
#   forder took ... 0.001 secs
#   Generating non-equi group ids ... done in 0.452 secs
#   Recomputing forder with non-equi ids ... done in 0.001 secs
#   Found 623 non-equi group(s) ...
# Starting bmerge ...done in 8.118 secs
# Detected that j uses these columns: x.row,i.row 
#    user  system elapsed 
#   8.492   0.038   8.577 

head(ans)
# [1]  5  1  3  2  1 12
tail(ans)
# [1]  2  1  1  2  1 NA
2
votes

Here's a dplyr method:

library(dplyr)
set.seed(123)
dt <- data.frame( Temp = runif(10,0,20) )
dt %>% mutate(Day_Below_Temp = 
                 sapply(1:length(Temp), function(x) min(which(.$Temp[x:length(.$Temp)] < .$Temp[x]))-1))

        Temp Day_Below_Temp
1   5.751550              5
2  15.766103              1
3   8.179538              3
4  17.660348              2
5  18.809346              1
6   0.911130            Inf
7  10.562110              3
8  17.848381              1
9  11.028700              1
10  9.132295            Inf
1
votes

This does the job - not very fast though

DT[, rowN := .I]

DT[, Day_Below_Temp := which(DT$Temp[rowN:nrow(DT)] < Temp)[1] - 1, 
   by = rowN
   ][, rowN := NULL]