1
votes

I'm dealing with topic modeling for short text and have come across three models that focus on the same: The biterm topic model (BTM), the word network topic model (WNTM) and the latent-feature LDA (LF-LDA).

I know that for conventional LDA (I have implemented it using the R package topicmodels), the unstructured shape of text documents is converted to a computer-readable format via the construction of a Document-Term matrix (DTM).

I'm wondering if the above mentioned models use a similar way for implementation, especially if they also create a matrix that is similar to DTM. Does anyone know that? Unfortunately I couldn't find that information by reading the original papers.

Thank you in advance!

1
Since your question is less a programming issue but rather a general question about models and their structure you might consider asking at Crossvalidated instead of SO. To my knowledge, there is no implementation of topic modelling in R covering other models than LDA or CTM (VEM or Gibbs), yet. Corresponding packages would be topicmodels, lda, or text2vec, each using slightly different sampling/estimation algorithms.Manuel Bickel

1 Answers

1
votes

BTM and TKM (which might also be good for short texts - https://github.com/JohnTailor/tkm) do not construct a document term matrix (DTM). WNTM might construct one. I don't know LF-LDA. BTM, WNTM and TKM take into account the position of a word using sliding windows, e.g. "The house is white" and "The white house is" might give different results under certain settings. A DTM does not capture word order, ie. for the above examples both would give the same DTM. WNTM might benefit from a DTM, when infering the topic-document distribution, but for the inference of its parameters (word-topics) it does not.