0
votes

i am a new programmer in R..my data is given below, i want to extract two or more words from this corpus.my example code also given below

MY CORPUS OR document


apple tops in online shopping us retail sales on apple aapl mobile devices were times higher than sales on google goog androidpowered smartphones and tablets accounting for of all online sales activity according a report from ibm smarter commerce owners of apple ios devices spent an average of per  pm et apple facing margin pressure wells fargo securities downgraded apple aapl stock to market perform from outperform saying the companys gross profit margin will come under pressure with release of its next smartphone which is likely to be called iphone as wireless service providers pull back on subsidizing the retail stock goes down and sometime move up pm et synaptics follows apple into fingerprint id market synaptics syna a leading maker of touch interfaces for computers and mobile devices is expanding into the growing market for fingerprint identification the san jose califbased company touted its november acquisition of fingerprint id company validity sensors as an integral part of its 
pm et apple ios beats google android in mobile shopping us retail sales on apple aapl mobile devices were five times higher than sales recorded on google goog androidpowered smartphones and tablets according to a report released thursday by ibm smarter commerce a unit of ibm ibm but ibm which tracks more than us retail websites found 

ultra hd curvedscreen tvs wearables big at ces sale down ultra highdefinition tvs wearable computers and sensors and consumer d printers are among the products expected to make waves next week at the consumer electronics show in las vegas officially known as the international ces it is expected to attract attendees more than  
pm et four major products apple might unveil in buy up apple aapl ceo tim cook has talked about being an exciting year for new apple products including new categories but hes been intentionally vague industry analysts have weighed in with their best guesses on what new products we can expect from apple in the year ahead


I have declared manually a dictionary for keyword extracting but the problem is that, i am not able to extract two or more keywords occurrences or frequency from this corpus. any suggestion

My Code example

this my corpus code

corpus<-Corpus(DirSource("corpus"),readerControl=list(readPlain,language="en"))

this is my dictionary

which_words<-Dictionary(c("move up","sale","stock goes up"))

this is my matching code

total<-(DocumentTermMatrix(corpus,list(dictionary = which_words)))

this is my result

inspect(total)
       Terms
Docs   move up sale stock goes up
1.txt     0      1     0
1
I am not sure that move up or stock goes up are a "single" word. - agstudy
Dear Sir, move up is two word, stock goes up three words ,and sale is one word . i put it in a single string like "move up","sale","stock goes up". - user3222412

1 Answers

0
votes

As a workaround you can concatenate words in a single word:

txt <- gsub("move up","moveup",txt)
txt <- gsub("goes up","goesup",txt)
txt <- gsub("goes down","goesdown",txt)

library(tm)

corpus <- Corpus(VectorSource(txt))
which_words <- c("moveup","sale","goesup","goesdown")
total <- DocumentTermMatrix(corpus,list(dictionary = which_words))
inspect(total)
Docs goesdown goesup moveup sale
   1        1      0      1    1

But , it is better to see some sentiment analysis package to do this.