Currently doing an LDA analysis using Python and the Gensim Mallet wrapper. After training the model and getting the topics, I want to see how the topics are distributed over the various document. In the normal Gensim LDA analysis, it is possible to use the get_document_topics function, which I could have used to iterate over every document in my file. However, Mallet wrapper does not have this function. I can retrieve the distribution of topics over one specific document, but can't find a solution to collect and store this over every document (for instance into a list or dataframe).
I can use the following code to acquire the topic distribution over one document:
print (ldamallet[mm[6000]])
which would return the following output:
[(0, 0.3055555555555555), (1, 0.3253968253968254), (2, 0.36904761904761907)]
However, I can't get it to iterate over the more or less 9000 documents in my dataset.
Additional code that could be relevant:
id2word = corpora.Dictionary(wordsFiltered)
id2word.filter_extremes(no_below=167, keep_tokens=None)
mm=[id2word.doc2bow(wordsFilter) for wordsFilter in wordsFiltered]
mallet_path = 'path'
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=mm, num_topics=3, id2word=id2word)
Anyone some suggestions? Thanks in advance!