I created an LDA model for some text files using gensim package in python. I want to get topic's distributions for the learned model. Is there any method in gensim ldamodel class or a solution to get topic's distributions from the model? For example, I use the coherence model to find a model with the best cohrence value subject to the number of topics in range 1 to 5. After getting the best model I use get_document_topics method (thanks kenhbs) to get topic distribution in the document that used for creating the model.
id2word = corpora.Dictionary(doc_terms)
bow = id2word.doc2bow(doc_terms)
max_coherence = -1
best_lda_model = None
for num_topics in range(1, 6):
lda_model = gensim.models.ldamodel.LdaModel(corpus=bow, num_topics=num_topics)
coherence_model = gensim.models.CoherenceModel(model=lda_model, texts=doc_terms,dictionary=id2word)
coherence_value = coherence_model.get_coherence()
if coherence_value > max_coherence:
max_coherence = coherence_value
best_lda_model = lda_model
The best has 4 topics
print(best_lda_model.num_topics)
4
But when I use get_document_topics, I get less than 4 values for document distribution.
topic_ditrs = best_lda_model.get_document_topics(bow)
print(len(topic_ditrs))
3
My question is: For best lda model with 4 topics (using coherence model) for a document, why get_document_topics returns fewer topics for the same document? why some topics have very small distribution (less than 1-e8)?