3
votes

I wanted to implement LDA with tensorflow as a practice, and I think the tensorflow version may have the advantages below:

  • Fast. If I can use the built-in ops to express the sampling process.
  • Easy to parallelize. Many ops have been implemented with optimizations for parallelization, so this lda should be easy to run on gpus or distributed clusters.
  • Shorter and cleaner code. Like many other models, especially NNs, building such models with tensorflow involves less code.

While after I inspected some python implementations of lda(for example, https://github.com/ariddell/lda/), I have no idea what ops of tensorflow can be used, what kind of graph should be built and what optimizer should I choose. Because the process of the gibbs sampling looks like all about element-wise updating of the doc-topics, the topic-words matrices and the topic counting table. So what can tensorflow do to simplify and optimze this process?

And can I treat the likelihood of the generated doc to the real input doc as the optimization target and utilize a gradient boost optimizer to minimize the negative of the likelihood, thus get alpha, beta and doc-topics distributions? Because if this is tractable, tensorflow definitely can be used here.

2

2 Answers

6
votes

There are many related answers to this on the broader question of how probabilistic programming benefits from deep probabilistic programming systems.

I can give one pointed answer for Latent Dirichlet Allocation (LDA) in TensorFlow. A key benefit is from recognizing that LDA is just a model. Given this model, and a dataset represented as a document-by-term matrix (e.g., via tf.SparseTensor), TensorFlow lets you not only perform scalable inference but very flexible inference. Specific ops to use in TF depends on the specific algorithm. You can write a Gibbs sampler or coordinate ascent variational inference algorithm—both highly efficient for LDA (usable with manual tf.assign ops on trainable variables). CAVI is computationally and memory-efficient, scaling to millions of documents and reifiable with efficient data pipelines such as tf.data.

With TensorFlow, you can also use generic methods such as black box variational inference, which are extremely versatile and do not require manual tf.assign ops. Once you've written it to work well on your problem, you can extend LDA in many ways such as with nonconjugate priors, hierarchical priors, and deep network parameterizations (possible with tf.layers). Generic methods require tools such as TensorFlow optimizers and TensorFlow's automatic differentiation for gradient-based optimization. These are not available in Python unless you expoit tracing tools such as autograd.

-1
votes

I have run these two models, so I think i have some ideas from practice. LDA's outputs are the topic distribution and the word distribution, which inputs are the words in documents. Word2Vec's output is the vector expression of sentence. In your application scene, your aim is to recommend the similar topic, instead of the similar meaning of sentence. For examples, "I find a very cute cat.", "My uncle's cat is fat and I feed it foods, I'm satisfied." these two sentences has different meanings, but these sentences includes the same topic-cat. Hope it helpful.