I've got the following code, which I expect to give me a list of 3, since there are 3 elements in texts
:
library(stringr)
texts <- c("I doubt it! :)", ";) disagree, but ok.", "No emoticons here!!!")
smileys <- c(":)","(:",";)",":D")
str_extract_all(texts, fixed(smileys))
Instead, I get a list of four (the length of my "pattern" parameter, here the smileys
. Additionally, I get the following warning message:
Warning message: In stri_extract_all_fixed(string, pattern, simplify = simplify, : longer object length is not a multiple of shorter object length```
Well, I don't imagine length will match, as I'm looking for any hits on any of the smileys in each text. It's not like I want to match string 1 with pattern 1, string 2 with pattern 2, etc.
Aware that I am messing up stringi's understanding of vectorizing, I have tried this instead:
texts %>% map(~ str_extract_all(.x, fixed(smileys)))
This is much better, as it gives me a list of 3, but each element is in turn a list of four.
What I'm trying to get to is a list of 3 that is as little nested as possible. Someone, somewhere, has solved this, but I can't for the life of me figure it out or get how to google it. I could do a for loop over this, but I consider myself a citizen of the tidyverse...
Grateful for any assistance.
stringr
, but I believe you may have look at grep using a character vector with multiple patterns. If you pursue the "paste
collapse = |
" method, then you might need to consider How do I deal with special characters like \^$.?*|+()[{ in my regex? – Henrikpattern <- paste("\\Q", smileys, "\\E", sep = "", collapse = "|"); stringi::stri_extract_all_regex(texts, pattern)
– JotaQ
/E
method is described in the second link I provided. – Henrik