How does LDA give consistent results?

Question

The popular topic model, Latent Dirichlet Allocation (LDA), which when used to extract topics from a corpus, returns different topics with different probability distributions over the dictionary words.

Whereas Latent Semantic Indexing (LSI) gives the same topics and same distributions after every iteration.

In reality LDA is widely used to extract topics. How does LDA maintain consistency if it returns different topic distribution every time a classification is made?

Consider this simple example. A sample of documents are taken where D represents a document:

D1: Linear Algebra techniques for dimensionality reduction
D2: dimensionality reduction of a sample database
D3: An introduction to linear algebra
D4: Measure of similarity and dissimilarity of different web documents
D5: Classification of data using database sample
D6: overfitting due lack of representative samples
D7: handling overfitting in descision tree
D8: proximity measure for web documents
D9: introduction to web query classification
D10: classification using LSI

Each line represents a document. On the above corpus the LDA model is used to generate the topics from the document. Gensim is used for LDA, batch LDA is performed where number of topics chosen are 4 and number of passes are 20.

Now on the original corpus the batch LDA is performed and the topics generated after 20 passes are:

topic #0: 0.045*query + 0.043*introduction + 0.042*similarity + 0.042*different + 0.041*reduction + 0.040*handling + 0.039*techniques + 0.039*dimensionality + 0.039*web + 0.039*using

topic #1: 0.043*tree + 0.042*lack + 0.041*reduction + 0.040*measure + 0.040*descision + 0.039*documents + 0.039*overfitting + 0.038*algebra + 0.038*proximity + 0.038*query

topic #2: 0.043*reduction + 0.043*data + 0.042*proximity + 0.041*linear + 0.040*database + 0.040*samples + 0.040*overfitting + 0.039*lsi + 0.039*introduction + 0.039*using

topic #3: 0.046*lsi + 0.045*query + 0.043*samples + 0.040*linear + 0.040*similarity + 0.039*classification + 0.039*algebra + 0.039*documents + 0.038*handling + 0.037*sample

Now batch LDA is performed on the same original corpus again and the topics generated in that case are:

topic #0: 0.041*data + 0.041*descision + 0.041*linear + 0.041*techniques + 0.040*dimensionality + 0.040*dissimilarity + 0.040*database + 0.040*reduction + 0.039*documents + 0.038*proximity

topic #1: 0.042*dissimilarity + 0.041*documents + 0.041*dimensionality + 0.040*tree + 0.040*proximity + 0.040*different + 0.038*descision + 0.038*algebra + 0.038*similarity + 0.038*techniques

topic #2: 0.043*proximity + 0.042*data + 0.041*database + 0.041*different + 0.041*tree + 0.040*techniques + 0.040*linear + 0.039*classification + 0.038*measure + 0.038*representative

topic #3: 0.043*similarity + 0.042*documents + 0.041*algebra + 0.041*web + 0.040*proximity + 0.040*handling + 0.039*dissimilarity + 0.038*representative + 0.038*tree + 0.038*measure

The word distribution in each topic is not same in both the cases. In fact, the word distribution is never the same.

So how does LDA work effectively if it doesn't have the same word distribution in its topics like LSI?

I'm not sure I understand the problem. Are you worried that two runs of an LDA training algorithm might return different models? — Fred Foo
@larsmans added some more information to make my point clear. Hope it is clear — Kai
See also this question: stackoverflow.com/questions/34741850/… — jk - Reinstate Monica

Fred Foo Fred Foo · Accepted Answer · 2012-02-28T10:57:41

I think there's two issues here. Firstly, LDA training is not deterministic like LSI is; the common training algorithms for LDA are sampling methods. If results over multiple training runs are wildly different, that's either a bug, or you've used the wrong settings, or plain bad luck. You can try multiple runs of LDA training if you're trying to optimize some function.

Then as for clustering, querying and classification: once you have a trained LDA model, you can apply that model to other documents in a deterministic way. Different LDA models will give you different results, but from one LDA model that you've labeled as the final model, you'll always get the same result.

How does LDA give consistent results?

4 Answers