tm
is throwing an error when I try to create a document term matrix
library(tm)
data(crude)
#control parameters
dtm.control <- list(
tolower = TRUE,
removePunctuation = TRUE,
removeNumbers = TRUE,
stopWords = stopwords("english"),
stemming = TRUE, # false for sentiment
wordLengths = c(3, "inf"))
dtm <- DocumentTermMatrix(corp, control = dtm.control)
Error:
Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : 'i, j, v' different lengths In addition: Warning messages: 1: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code 2: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : NAs introduced by coercion
What am I doing wrong? Also:
I am using these tutorials:
Are there better/ more recent walkthroughs?