I would like to manipulate (rename and combine) features in a dfm, how to proceed?
The reason is as follows: I want to use a different stemming algorithm than the Porter stemmer implemented in Quanteda (the kpss algorithm called via Python).
Example The three-word sentence c("creatief creatieve creatie") will result in a dfm with three features (ie. "creatief", "creatieve", "creatie") all with a term-frequency of 1. However, the kpss algorithm will stem these words to "creatie". It would be very handy if I could combine these three features in the dfm into a single feature called "creatie" with a term-frequency of three.
Your help is deeply appreciated.
(Note. I understand that such data manipulations are possible after a dfm is transformed into a 'simple' matrix, but I would like to do this in a dfm).
Addendum I overlooked the dfm_compress function. I am almost there... After I have compressed the dfm, is it possible too to apply a dictionary, e.g. the words 'creati' and 'innovati' should be both counted as occurences of the word-category 'creati' (cf. the dictionary function in dfm)? (Note. Given the huge volume of txts I would rather not prefer to stem the raw data files)