I am having a trouble in the tm package of R. I am using 0.6.2 version. Following question (2 different errors) has already been answered here and here but still producing an error after using the posted solution. Please click here to download the dataset (93 rows only). It's a reproducible example. the two errors are below:
(Resolved) Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "character"
Error: inherits(doc, "TextDocument") is not TRUE
- tm_map(ds.corpus, PlainTextDocument) does not create a plain text document in this case. inherits(ds.cleanCorpus, "TextDocument") # returns FALSE
please tell me what is wrong in my approach.
--
# Data import
df.imp<- read.csv("Phone2_Sample100_NegPos.csv", header = TRUE, as.is = TRUE)
##### Data Pre-Processing
install.packages("tm")
require(tm)
ds.corpus<- Corpus(VectorSource(df.imp$Content))
ds.corpus<- tm_map(ds.corpus, content_transformer(tolower))
ds.corpus<- tm_map(ds.corpus, content_transformer(removePunctuation))
ds.corpus<- tm_map(ds.corpus, content_transformer(removeNumbers))
removeURL<- function(x) gsub("http[[:alnum:]]*", "", x)
ds.corpus<- tm_map(ds.corpus,removeURL)
stopwords.default<- stopwords("english")
stopWordsNotDeleted<- c("isn't" , "aren't" , "wasn't" , "weren't" , "hasn't" ,
"haven't" , "hadn't" , "doesn't" , "don't" ,"didn't" ,
"won't" , "wouldn't", "shan't" , "shouldn't", "can't" ,
"cannot" , "couldn't" , "mustn't", "but","no", "nor", "not", "too", "very")
stopWord.new<- stopwords.default[! stopwords.default %in% stopWordsNotDeleted] ## new Stopwords list
ds.corpus<- tm_map(ds.corpus, removeWords, stopWord.new )
copy<- ds.corpus ## creating a copy to be used as a dictionary
ds.corpus<- tm_map(ds.corpus, stemDocument)
## error Statement #1
ds.corpus<- stemCompletion(ds.corpus, dictionary = copy)
## Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "character"
ds.cleanCorpus<- tm_map(ds.corpus, PlainTextDocument) ## creating plain text document
class(ds.cleanCorpus) ## output is VCorpus" "Corpus". what it should be??
## error Statement #2
tdm<- TermDocumentMatrix(ds.corpus) ## creating term document matrix
inherits(ds.cleanCorpus, "TextDocument") ## returns FALSE
Update: Figured out first error, that the stemCompletion method's x parameter should be a character vector and dictionary could be either a corpus or character vector. However, when I tried it on first document (character vector) of ds.corpus, as below, stemmed words were not completed and output is just the stemmed character vector like before.
stemCompletion(ds.corpus[[1]]$content, dictionary = copy)
So now my main question is "How to complete a stemmed corpus from a dictionary (tm package)?" The stemCompletion method doesn't seems working (on a character vector). Secondly, how can I complete the stemming of an entire corpus, should I use a for loop for each document of the corpus's content?
?stemCompletion
. InstemCompletion(ds.corpus,stemCompletion, dictionary = copy)
you are passing an object of typeCorpus
to an argument that should be of typecharacter
, and... welll... I dunno where the 2nd argumentstemCompletion
should go. Maybe you should clarify what you are trying to accomplish...? – lukeA