2
votes

I have used LDA on a corpus of documents and found some Topics. The output of my code is two matrices containing probabilities. one doc-topic probabilities and the other word-topic probabilities. But I actually don't know how to use these results to predict the topic of a new document. I am using Gibbs sampling. Does anyone know how? thanks

1
I was going to suggest stats.stackexchange.com when I noticed that you've already cross-posted the question there.NPE
Have you looked at mblondel.org/journal/2010/08/21/… (there is a linked gist to sample code) and blog.josephwilk.net/projects/…Philip Southam
Your description is a bit confusing as you wrote that you used LDA to find topics in the documents. As far as I recall my information retrieval lectures, LDA is an advanced smoothing technique to predict probabilities for words which are contained in the query, but which are not present in a document, based on the probability that the word would be generated by a certain topic-model. So it would be very useful if you would provide some more information on what you've actually done so far.das_weezul
What is it that you want to do with the new test document? Find out topic probabilities for it? Or actually find out what topic each word was generated from?abhinavkulkarni

1 Answers

3
votes

The Java implementation http://www.arbylon.net/projects/lda-j/lda-j-src-20050325.zip has an short example program in src\org\knowceans\lda\SearchEnglet.java. I hope you are a bit familiar with java and the code helps you.

The original paper http://jmlr.csail.mit.edu/papers/volume3/blei03a/blei03a.pdf describes inference in sections 5.1 and 5.2.