0
votes

Suppose I build a LDA topic model using gensim or sklearn and assign top topics to each document. But some of documents don't match top topics assigned. Besides trying out different numbers of topics or use coherence score to get the optimal number of topics, what other tricks can I use to improve my model?

1

1 Answers

0
votes

LDA also (semi-secretly) takes the parameters alpha and beta. Think of alpha as the parameter that tells LDA how many topics each document should be generated from. beta is the parameter that tells LDA how many topics each word should be in. You can play with these and you may get better results.

However, LDA is an unsupervised model, and even the perfect settings for k, alpha, and beta will result in some incorrectly assigned documents. If your data isn't preprocessed well, it almost doesn't matter what you assign the parameters, it will always produce poor results.