The maxWordLength argument in DocumentTermMatrix doesn't seem to have any effect (no warnings, no errors). I'm using tm_0.6-2. R version 3.2.2 mac. Any ideas?
df <- Corpus(DataframeSource(data.frame(as.character("this is my test string with an exceptionally long word")))) df.dtf <- DocumentTermMatrix(df, control = list(tokenize = BigramTokenizer, minWordLength=2,maxWordLength=4, minDocFreq=minFreq))
inspect(df.dtf)
yields:
inspect(df.dtf) <> Non-/sparse entries: 7/0 Sparsity : 0% Maximal term length: 13 Weighting : term frequency (tf)
Terms
Docs exceptionally long string test this with word 1 1 1 1 1 1 1 1