1
votes

I have this data.frame

c=c("AUSTOP","ATIP;AITUO", "BERTUI", "BHTREAK;PAERJR;KIEYTU", "FRTEU3", "IRTUFH","HAZEB", "ERTUJG;JIRTUE;HERTYE", "DAIER1", "ZERV")

pos=c(1,2,3,4,5,6,7,8,9,10)
data=data.frame(pos, c, xab)

> data
   pos                     c
1    1                AUSTOP
2    2            ATIP;AITUO
3    3                BERTUI
4    4 BHTREAK;PAERJR;KIEYTU
5    5                FRTEU3
6    6                IRTUFH
7    7                 HAZEB
8    8  ERTUJG;JIRTUE;HERTYE
9    9                DAIER1
10  10                  ZERV

And this vector :

xa=c("AUSTOP", "HTURIE", "IRTUFH", "JEURTU", "AITUO", "ERTUJG", "HERTYE", "DAIER", "ZERV1")

I would like to create another variable with for each line all the character chain present both in the c variable and in the xa vector as follows:

> data2
   pos                     c           xab
1    1                AUSTOP        AUSTOP
2    2            ATIP;AITUO         AITUO
3    3                BERTUI        BERTUI
4    4 BHTREAK;PAERJR;KIEYTU          <NA>
5    5                FRTEU3          <NA>
6    6                IRTUFH        IRTUFH
7    7                 HAZEB          <NA>
8    8  ERTUJG;JIRTUE;HERTYE ERTUJG;HERTYE
9    9                DAIER1          <NA>
10  10                  ZERV          <NA>

If there is a solution in tidyverse, it would be great.

1
In your xa vector there is no 'BERTUI' - akrun
@Patrick. Tip - don't use c as a variable name - because c is also used as a function, you could end up doing something unintended with that variable name - user438383
Thank you for these notices. I agree this is not a good idea to use c as a variable name. I am not used to it usually, it was just for the example but I will try to avoid it even for example. - Patrick Parts

1 Answers

1
votes

We can create a single pattern by pasteing the 'xa' together with collapse as |, use str_extract_all to get all the strings that match the pattern from the 'c' column, loop over the list and paste them

library(dplyr)
library(stringr)
library(purrr)
data %>% 
    mutate(new = map_chr(str_extract_all(c, str_c("\\b(", str_c(xa,
        collapse="|"), ")\\b")), str_c, collapse=";"))

-output

  pos                     c           new
1    1                AUSTOP        AUSTOP
2    2            ATIP;AITUO         AITUO
3    3                BERTUI        BERTUI
4    4 BHTREAK;PAERJR;KIEYTU              
5    5                FRTEU3              
6    6                IRTUFH        IRTUFH
7    7                 HAZEB              
8    8  ERTUJG;JIRTUE;HERTYE ERTUJG;HERTYE
9    9                DAIER1              
10  10                  ZERV  

data

xa <- c("AUSTOP", "HTURIE", "IRTUFH", "JEURTU", "AITUO", "ERTUJG", 
"HERTYE", "DAIER", "ZERV1", "BERTUI")
data <- structure(list(pos = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), c = c("AUSTOP", 
"ATIP;AITUO", "BERTUI", "BHTREAK;PAERJR;KIEYTU", "FRTEU3", "IRTUFH", 
"HAZEB", "ERTUJG;JIRTUE;HERTYE", "DAIER1", "ZERV")), 
class = "data.frame", row.names = c(NA, 
-10L))