1
votes

I have a dataframe with predefined words/phrases.

Example: df$term

stock
revenue
continuous improvement

and another dataframe (df2) which have one column with many rows and in every row there is a text. Example df2$sentence

I used to study at university and in my free time observe the stock prices. Additionally the revenue of every stock
Stock market is my first interest
I always try to continuous improvement

using the terms from df I would like to detect for every row the terms and have as output something like this

row_number, stock,  continuous improvement, revenue
1,1,0,1
2,1,0,0
3,0,1,0

Is there any simple way to make it?

1

1 Answers

2
votes

You can do this as follows:

# Create some fake data
words <- c("stock", "revenue", "continuous improvement")
phrases <- c("blah blah stock and revenue", "yada yada revenue yada", 
             "continuous improvement is an unrealistic goal", 
             "phrase with no match")

# Apply the 'grepl' function along the list of words, and convert the result to numeric
df <- data.frame(lapply(words, function(word) {as.numeric(grepl(word, phrases))}))
# Name the columns the words that were searched
names(df) <- words
df
    stock revenue continuous improvement
1     1       1                      0
2     0       1                      0
3     0       0                      1
4     0       0                      0

I have not created a separate variable with the row numbers here, but you could always add that if you needed it with df$row.number <- 1:nrow(df).