Gensim LDA topic assignment

Question

I am hoping to assign each document to one topic using LDA. Now I realise that what you get is a distribution over topics from LDA. However as you see from the last line below I assign it to the most probable topic.

My question is this. I have to run lda[corpus] for somewhat the second time in order to get these topics. Is there some other builtin gensim function that will give me this topic assignment vectors directly? Especially since the LDA algorithm has passed through the documents it might have saved these topic assignments?

    # Get the Dictionary and BoW of the corpus after some stemming/ cleansing
    texts = [[stem(word) for word in document.split() if word not in STOPWORDS] for document in cleanDF.text.values]
    dictionary = corpora.Dictionary(texts)
    dictionary.filter_extremes(no_below=5, no_above=0.9)
    corpus = [dictionary.doc2bow(text) for text in texts]

    # The actual LDA component
    lda = models.LdaMulticore(corpus=corpus, id2word=dictionary, num_topics=30, chunksize=10000, passes=10,workers=4) 

    # Assign each document to most prevalent topic
    lda_topic_assignment = [max(p,key=lambda item: item[1]) for p in lda[corpus]]

Venkatachalam Venkatachalam · Accepted Answer · 2019-01-19T15:53:31

There is no other builtin Gensim function that will give the topic assignment vectors directly.

Your question is valid that LDA algorithm has passed through the documents but implementation of LDA is working by updating the model in chunks (based on value of chunksize parameter), hence it will not keep the entire corpus in-memory.

Hence you have to use lda[corpus] or use the method lda.get_document_topics()

Gensim LDA topic assignment

2 Answers