Corpus created, stopwords defined, cleansing done (removePunctuation, removeNumbers, tolower...).
The corpus is now ready to be stemmed. The function is executed correctly and all works as it should, but...
I need to know which words are being stemmed to each common root. Is that possible using the tm package? Or any other package?
For example, TermA1, TermA2, TermB1, TermB2, TermB3, all of them are stemmed to Term and my new Corpus reflect only Term. However, I need also to know which words are associated with each root word, and therefore an optimal output should be:
Term Stemm
TermA1 Term
TermA2 Term
TermB1 Term
TermB2 Term
TermB3 Term
...
WordA1 Word
WordB1 Word
WordB2 Word
WordB3 Word
WordC1 Word