I'm trying to run the AssociatedPress dataset from the tm
-package through text2vec
's LDA implementation.
The problem I'm facing is the incompatibility of data types: AssociatedPress
is a tm::DocumentTermMatrix
which in turn is a subclass of slam::simple_triplet_matrix
. text2vec
however expects the input x
to text2vec::lda$fit_transform(x = ...)
to be Matrix::dgTMatrix
.
My question thus is: is there a way to coerce DocumentTermMatrix
to something accepted by text2vec
?
Minimal (failing) example:
library('tm')
library('text2vec')
data("AssociatedPress", package="topicmodels")
dtm <- AssociatedPress[1:10, ]
lda_model = LDA$new(
n_topics = 10,
doc_topic_prior = 0.1,
topic_word_prior = 0.01
)
doc_topic_distr =
lda_model$fit_transform(
x = dtm,
n_iter = 1000,
convergence_tol = 0.001,
n_check_convergence = 25,
progressbar = FALSE
)
...which gives:
base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 'x' must be an array of at least two dimensions