0
votes

I am trying to do some analysis on twitter data. So I have tweets

head(words) 1 "#fabulous" "rock" "is" "#destined" "to" "be" "star"

> head(hashtags)
      hashtags score
1    #fabulous 7.526
2   #excellent 7.247
3      #superb 7.199
4  #perfection 7.099
5    #terrific 6.922
6 #magnificent 6.672

So I want a to check words against hashtags dataframe and words character array and for every match, I want the sum of the value of scores. So in above case I want the output to be 7.526+6.922=14.448

Any help would be greatly appreciated.

1

1 Answers

0
votes

Try this

words_hashtags <- words[grepl('^#', words)]
scores <- hashtags[hashtags$hashtags %in% words_hashtags, 'score']
sum(scores)

grepl returns a logical vector indicating which words has hashtags in the beginning. The rest is just basic R syntax.

More options to get words_hashtags:

words_hashtags <- grep('^#', words, value=T)
words_hashtags <- words[grep('^#', words, value=F)]