I have a dataframe df with this structure :
Rank Review
5 good film
8 very good film
..
Then I tried to create a DocumentTermMatris using quanteda package :
mydfm <- dfm(df$Review, remove = stopwords("english"), stem = TRUE)
I would like how to calculate for each feature (term) the CHi2 value with document in order to extract best feature in terms of Chi2 value
Can you help me to resolve this problem please?
EDIT :
head(mydfm[, 5:10])
Document-feature matrix of: 63,023 documents, 6 features (92.3% sparse).
(showing first 6 documents and first 6 features)
> head(mydfm[, 5:10])
Document-feature matrix of: 63,023 documents, 6 features (92.3% sparse).
(showing first 6 documents and first 6 features)
features
docs bon accueil conseillèr efficac écout répond
text1 0 0 0 0 0 0
text2 1 1 1 1 1 1
text3 0 0 0 0 0 0
text4 0 0 0 0 0 0
text5 0 0 1 0 0 0
text6 0 0 0 0 1 0
...
text60300 0 0 1 1 1 1
Here I have my dfm matrix, then I create my tf-idf matrix :
tfidf <- tfidf(mydfm)[, 5:10]
I would like to determine chi2 value between these features and the documents (here I have 60300 documents) :
textstat_keyness(mydfm, target = 2)
But, since I have 60300 target, I don't know how to do this automatically . I see in the Quanteda manual that groups option in dfm function may resolve this problem, but I don't see how to do it. :(
EDIT 2 :
Rank Review 10 always good 1 nice film 3 fine as usual
Here I try to group document with dfm :
mydfm <- dfm(Review, remove = stopwords("english"), stem = TRUE, groups = Rank)
But it fails to group documents
Can you help me please to resolve this problem
Thank you