1
votes

I am using quanteda to build two document feature matrices:

library(quanteda)
DFM1 <- dfm("this is a rock")
#        features
# docs    this is a rock
#   text1    1  1 1    1
DFM2 <- dfm("this is music")
#        features
# docs    this is music
#   text1    1  1     1

However, I want DFM2 to have a specific set of features, namely the ones from DFM1:

DFM2 <- dfm("this is music", *magicargument* = featnames(DFM1))
#        features
# docs    this is a rock
#   text1    1  1 0    0

Is there a magicargument that I am missing? Or is there another efficient way to archieve it for large bags of words?

1

1 Answers

2
votes

The magic argument is pattern, where you supply a dfm whose features will be matched (including zeroes for features not in the target dfm):

dfm_select(DFM2, pattern = DFM1)
# Document-feature matrix of: 1 document, 4 features (50% sparse).
# 1 x 4 sparse Matrix of class "dfmSparse"
#        features
# docs    this is a rock
#   text1    1  1 0    0