Gensim's HDP model for topic modeling (gensim.models.hdpmodel.HdpModel) has a constructor that takes an argument called max_chunks
.
On the documentation, it says max_chunks
is the number of chunks the model will go over, and if that is larger than the number of chunks in supplied corpus, the training will wrap around the corpus.
Since I was warned by INFO logs that the likelihood function has been decreasing, I figure I may need multiple passes on corpus to converge.
LDA model provides with the passes
argument the functionality to train on corpus for multiple iterations. I have difficulty figuring out how max_chunks
in HDP maps to passes
in LDA.
For example, let say my corpus has 1000000 documents. what max_chunks
needs to be exactly in order to train, say, 3 passes on my corpus.
Any suggestion? Many many thanks