4
votes

I am trying to find patterns across rows of a data.table while still maintaining the linkages of data across the rows. Here is a reduced example:

Row ID Value
1   C  1000
2   A  500
3   T  -200
4   B  5000
5   T  -900
6   A  300

I would like to search for all instances of "ATB" in successive rows and output the integers from the value column. Ideally, I want to bin the number of instances as well. The output table would look like this:

String Frequency Value1 Value2 Value 3
ATB      1        500   -200    5000
CAT      1        1000   500    -200

Since the data.table packages seems to be oriented towards providing operations on a column or row-wise basis I thought this should be possible. However, I haven't the slightest idea where to start. Any pointers in the right direction would be greatly appreciated.

Thanks!

1
Could you show a few more lines with expected output - akrun
Based on the expected output showed, you selected the first 3 rows for CAT, 2:4 for ATB, so what about 3:5 and so on... - akrun
If you were searching only for "ATB", how did you end up with "CAT"? - Frank
And how will this look like if there several ATB combinations? You have Frequency there, so you want to combine somehow their values? - David Arenburg

1 Answers

0
votes
library("plyr")
library("stringr")
df <- read.table(header = TRUE, text = "Row ID Value
                 1   C  1000
                 2   A  500
                 3   T  -200
                 4   B  5000
                 5   T  -900
                 6   A  300
                 7   C  200
                 8   A  700
                 9   T  -500")
sought <- c("ATB", "CAT", "NOT")
ids <- paste(df$ID, collapse = "")
ldply(sought, function(id) {
    found <- str_locate_all(ids, id)
      if (nrow(found[[1]])) {
            vals <- outer(found[[1]][,"start"], 0:2, function(x, y) df$Value[x + y])
      } else {
            vals <- as.list(rep(NA, 3))
      }
      data.frame(ID = id, Count = str_count(ids, id),
                 setNames(as.data.frame(vals), paste0("Value", 1:3)))
})

Here's a solution using stringr and plyr. The ids are collapsed into a single string, all instances of each target located and then a data frame constructed with the relevant columns.