I have a dataset in which I have multiple texts per user. I want to build a corpus of all those documents with Quanteda but without losing the ability to link back the different texts to the corresponding user.
I will give you a sample code to help you understand a little bit more where I am failing.
df <- data.frame('ID'=c(1,1,2), 'Text'=c('I ate apple', "I don't like fruits", "I swim in the dark"), stringsAsFactors = FALSE)
df_corpus <- corpus(df$Text, docnames =df$ID)
corpus_DFM <- dfm(df_corpus, tolower = TRUE, stem = FALSE)
print(corpus_DFM)
This results in
Document-feature matrix of: 3 documents, 10 features (60.0% sparse).
3 x 10 sparse Matrix of class "dfm"
features
docs i ate apple don't like fruits swim in the dark
1 1 1 1 0 0 0 0 0 0 0
1.1 1 0 0 1 1 1 0 0 0 0
2 1 0 0 0 0 0 1 1 1 1
>
But I would like to obtain in dataframe that looks like this in my Document-feature matrix
Document-feature matrix of: 3 documents, 10 features (60.0% sparse).
3 x 10 sparse Matrix of class "dfm"
features
docs id i ate apple don't like fruits swim in the dark
text1 1 1 1 1 0 0 0 0 0 0 0
text2 1 1 0 0 1 1 1 0 0 0 0
text3 2 1 0 0 0 0 0 1 1 1 1
>
Is there a way to automatize this process using Quanteda. I would like to modify the the docs column of the dfm object but I do not know how to have access to it.
Any help would be welcome!
Thank you.