0
votes

I have two character vectors. I need to check wether each string of one is contained in the other, so I'm using stri_detect and lapply (quite fast).

> summary(claims)
   Length     Class      Mode 
   960322 character character 
> summary(rules)
   Length     Class      Mode 
       50 character character 

  > foo <- function(Match){
+ stri_detect_fixed(claims, Match)
+ }

> system.time(lapply(rules,foo))
   user  system elapsed 
  39.04    0.33   39.39 

The result of lapply looks like this:

[[1]]
   [1] FALSE FALSE FALSE FALSE FALSE FALSE ... #960322 values
[[2]]
   [1] FALSE FALSE FALSE  TRUE FALSE FALSE ... 
...
[[50]]
   [1] FALSE FALSE FALSE ...

My question is, how can I get a vector (of lenght 50) that has a FALSE (or a 0) if every value in that row has been FALSE or has a TRUE (or a 1) if at least some value of that row has been a TRUE?

I guess I can save the result of lapply as a dataframe and work with that but I was wondering if it can be done with lapply directly.

1
If you use the sum function. False has a value of 0 and if there is a True the result > 1. Try something like sapply(foo, sum). - Dave2e
sapply(lapply(rules,foo),sum) works great, thank you! I'm just angry I couldn't think of that :P - Hoju

1 Answers

0
votes

EDIT:

@Dave2e solved my question in the comments.

Using sapply(lapply(rules,foo),sum) gives 0 if every value in the row is FALSE or the sum of the TRUES that are there:

> sapply(lapply(rules,foo),sum)
 [1]  0  0  0  0  0  0  0  0  0  0  1  0  1  1  1  0  0  0  0  0  0  0  2  1  4 10  0  0  0  5  2  0  0  0  1  0 10  1  1  0  1  2  9  0  1 10  0  0  0  2

So just using replace gives what I wanted:

> x = sapply(lapply(rules,foo),sum)
> replace(x, x!=0, 1)
 [1] 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 1 0 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 0 0 1