Suppose I build a LDA topic model using gensim or sklearn and assign top topics to each document. But some of documents don't match top topics assigned. Besides trying out different numbers of topics or use coherence score to get the optimal number of topics, what other tricks can I use to improve my model?
0
votes
1 Answers
0
votes
LDA also (semi-secretly) takes the parameters alpha
and beta
. Think of alpha
as the parameter that tells LDA how many topics each document should be generated from. beta
is the parameter that tells LDA how many topics each word
should be in. You can play with these and you may get better results.
However, LDA is an unsupervised model, and even the perfect settings for k
, alpha
, and beta
will result in some incorrectly assigned documents. If your data isn't preprocessed well, it almost doesn't matter what you assign the parameters, it will always produce poor results.