I have a question around measuring/calculating topic coherence for LDA models built in scikit-learn.
Topic Coherence is a useful metric for measuring the human interpretability of a given LDA topic model. Gensim's CoherenceModel allows Topic Coherence to be calculated for a given LDA model (several variants are included).
I am interested in leveraging scikit-learn's LDA rather than gensim's LDA for ease of use and documentation (note: I would like to avoid using the gensim to scikit-learn wrapper i.e. actually leverage sklearn’s LDA). From my research, there is seemingly no scikit-learn equivalent to Gensim’s CoherenceModel.
Is there a way to either:
1 - Feed scikit-learn’s LDA model into gensim’s CoherenceModel pipeline, either through manually converting the scikit-learn model into gensim format or through a scikit-learn to gensim wrapper (I have seen the wrapper the other way around) to generate Topic Coherence?
Or
2 - Manually calculate topic coherence from scikit-learn’s LDA model and CountVectorizer/Tfidf matrices?
I have done quite a bit of research on implementations for this use case online but haven’t seen any solutions. The only leads I have are the documented equations from scientific literature.
If anyone has any knowledge on any similar implementations, or if you could point me in the right direction for creating a manual method for this, that would be great. Thank you!
*Side note: I understand that perplexity and log-likelihood are available in scikit-learn for performance measurements, but these are not as predictive from what I have read.