0
votes

I'm attempting to use removeWords in the R tm package using the following code:

docs <- tm_map(docs, removeWords, stopwords("english")) 

and I get the following error message:

Error in sort (words, decreasing = TRUE) :
   argument "words" is missing, with no default

All of the other transformations I've attempted on my corpus have worked as intended (tolower, removeNumbers, stripWhitespace, removePunctuation etc...) but I can not get removeWords to work properly, and can not find anything online about this particular error message.

I'd very much appreciate any insight into what might be causing this error.

Edit: My corpus consists of html documents all located in the same folder. The code I'm using to test the removeWords transformation is as follows:

setwd(“C:/folder”)
library(RCurl)
library(XML)
library (tm)
library (SnowballC)
docs <- Corpus(DirSource(“C:/folder”))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, tolower)
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs, removeWords, stopwords(“english”))
1
Using the built in sample data, this seems to work data(crude); tm_map(crude, removeWords, stopwords("english")). You should provide some sort of reproducible example to make it clear how your situation is different. What you have provided should work. Perhaps provide the version information from sessionInfo(). - MrFlick
Thanks MrFlick - I've edited the original post. - ChrisB
Well, that doesn't really help with reproducibiolity since it relies on files only on your machine. But i'd guess the problem might be with tolower. Try docs <- tm_map(docs , content_transformer(tolower)). Also I assume removePucntuation is just a typo? - MrFlick
Still the same error message after using content_transformer(tolower). And yeah, that was just a typo. Regarding a reproducible example, the code has worked for me on simple test data, but the issue pops up when I apply it to the corpus of html documents. - ChrisB
Well, then that error doesn't make a lot of sense. Maybe try just one document. Try the smallest possible document. Unless the error is reproducible, it's not going to be easy to help you. Maybe include the traceback() and verify the value of class(docs) before running removeWords. Also i assume you are using tm_map and not tm_maps has you've typed. It's important that the code you share accurately reflects what you are actually running -- that's the whole point! - MrFlick

1 Answers

0
votes

Try adding words to remove words function.

Example:

corpus = tm_map(corpus, removeWords, c("apple", stopwords("english")))