I have two data frames, df1:
df1 <- c("A large bunch of purple grapes", "large green potato sack", "small red tomatoes", "yellow and black bananas")
df1 <- data.frame(df1)
df2:
Word <- c("green", "purple", "grapes", "small", "sack", "yellow", "bananas", "large)
Rank <- c(20,18,22,16,15,17,6,12)
df2 <- data.frame(Word,Rank)
df1:
ID Sentence
1 A large bunch of purple grapes
2 large green potato sack
3 small red tomatoes
4 yellow and black bananas
df2:
ID Word Rank
1 green 20
2 purple 18
3 grapes 22
4 small 16
5 Sack 15
6 yellow 17
7 bananas 6
8 large 12
What I want to do is; match the words in df2 to the words contained in the "Sentence" column and insert a new column in df1 containing the highest ranking matched word from df2. So something like this:
df1:
ID Sentence Word
1 A large bunch of purple grapes grapes
2 large green potato sack green
3 small red tomatoes small
4 yellow and black bananas yellow
I initially used to following code to match words, but of course this creates a column containing all of the words matched:
x <- sapply(df2$Word, function(x) grepl(tolower(x), tolower(df1$Sentence)))
df1$top_match <- apply(x, 1, function(i) paste0(names(i)[i], collapse = " "))
df2
, do you want to just returnNA
? In this case, all sentences have a match, but I just want to make sure you are not looking for something more general. – acylamdeput(df1)
deput(df2)
or as the code you used to generate them? – acylam