I'm creating a document-term matrix with the tm-package in R, but some of the words in my corpus get lost in the process somewhere.
I will explain with an example. Let's say I have this small corpus
library(tm)
crps <- " more hours to my next class bout to go home and go night night"
crps <- VCorpus(VectorSource(crps))
When I use DocumentTermMatrix() from the tm-package, it will return these results:
dm <- DocumentTermMatrix(crps)
dm_matrix <- as.matrix(dm)
dm_matrix
# Terms
# Docs and bout class home hours more next night
# 1 1 1 1 1 1 1 1 2
However, what I want (and expected) is:
# Docs and bout class home hours more next night my go to
# 1 1 1 1 1 1 1 1 2 1 2 1
Why does DocumentTermMatrix() skip the words "my","go"and "to"? Is there a way to control and fix this function?
tm-package? What kind of object iscrps? How did you getcrps? Did you use something likecrps <- Corpus(VectorSource(some_text_string))? - KenHBScrps <-VCorpus(VectorSource(My_text))- Fouad Selmane