0
votes

The maxWordLength argument in DocumentTermMatrix doesn't seem to have any effect (no warnings, no errors). I'm using tm_0.6-2. R version 3.2.2 mac. Any ideas?

df <- Corpus(DataframeSource(data.frame(as.character("this is my test string with an exceptionally long word")))) df.dtf <- DocumentTermMatrix(df, control = list(tokenize = BigramTokenizer, minWordLength=2,maxWordLength=4, minDocFreq=minFreq))

inspect(df.dtf)

yields:

inspect(df.dtf) <> Non-/sparse entries: 7/0 Sparsity : 0% Maximal term length: 13 Weighting : term frequency (tf)

Terms

Docs exceptionally long string test this with word 1 1 1 1 1 1 1 1

1

1 Answers

0
votes

This works for me, if I understood you correct and you just want to 'limit' your maximum word length:

df.dtf <- DocumentTermMatrix(df, control = list( wordLengths=c(1,4)))