I am trying to do some text mining, using tm package, on reviews that Italian users of a certain website wrote there. I scraped the texts, stored them on a corpus, did some sort of cleaning, but when I try to get the stems of the words by removing the common endings, I have problem specifying the Italian language instead of default one, i.e. English.
reviews_corpus <- tm_map(reviews_corpus, removeNumbers)
reviews_corpus <- tm_map(reviews_corpus, removePunctuation)
reviews_corpus <- tm_map(reviews_corpus, stripWhitespace)
reviews_corpus <- tm_map(reviews_corpus, content_transformer(tolower))
reviews_corpus <- tm_map(reviews_corpus, removeWords, stopwords("italian"))
reviews_corpus <- tm_map(reviews_corpus, stemDocument(reviews_corpus, language="italian"))
First five lines work fine, but for the last one R gives me:
Error in UseMethod("stemDocument", x) :
no applicable method for 'stemDocument' applied to an object of class "c('VCorpus', 'Corpus')"
So, my problem is that how can I use stemDocument on a corpus but specify the language I want to be used?
stemDocumenton corpus, can you perform it earlier before turning the text into a corpus? - Ronak Shah