I wanted to implement LDA with tensorflow as a practice, and I think the tensorflow version may have the advantages below:
- Fast. If I can use the built-in ops to express the sampling process.
- Easy to parallelize. Many ops have been implemented with optimizations for parallelization, so this lda should be easy to run on gpus or distributed clusters.
- Shorter and cleaner code. Like many other models, especially NNs, building such models with tensorflow involves less code.
While after I inspected some python implementations of lda(for example, https://github.com/ariddell/lda/), I have no idea what ops of tensorflow can be used, what kind of graph should be built and what optimizer should I choose. Because the process of the gibbs sampling looks like all about element-wise updating of the doc-topics, the topic-words matrices and the topic counting table. So what can tensorflow do to simplify and optimze this process?
And can I treat the likelihood of the generated doc to the real input doc as the optimization target and utilize a gradient boost optimizer to minimize the negative of the likelihood, thus get alpha, beta and doc-topics distributions? Because if this is tractable, tensorflow definitely can be used here.