0
votes

While forming the Gensim LDA model, I got dictionary for my Data using following command

    from gensim.corpora import Dictionary
    dictionary1 = Dictionary(docs)
    dictionary1.filter_extremes(no_below=10, no_above=0.75, keep_n = 1000)

Out of these 1000 most frequent tokens I manually removed 500 tokens so that the remaining tokens would be directly related to the topics I want to generate. How can i further form corpus document out of this new dictionary formed which is of type dict. In which form should I use it as to train my LDA model?

1

1 Answers

0
votes

You could train LDA model as follows:

## Construct corpus and vectorize
corpus = [dictionary1.doc2bow(content) for content in docs]

## train LDA model with 5 topics over 100 passes
## number of topics is chosen randomly in this case
## higher number of passes leads to better results but increases complexity 
lda_model = gensim.models.ldamodel.LdaModel(corpus, num_topics=5, id2word = dictionary1, passes=100)

print(lda_model.print_topics(num_topics=5, num_words=3))