4
votes

I am a typical, regular, everyday Spark user. In Spark's LDA there are hyperparameters thats stands for

docConcentration: Hyperparameter for prior over documents’ distributions over topics. Currently must be > 1, where larger values encourage smoother inferred distributions. topicConcentration: Hyperparameter for prior over topics’ distributions over terms (words). Currently must be > 1, where larger values encourage smoother inferred distributions.

which corresponds to typically assigned in the literature $\alpha$ and $\beta$ parameters for which (and $k$ - number of topics) the log-likelihood function of the LDA model is optimized during the convergence process.

Does anyone know if there is any option to set such arguments/parameters prior in vowpal wabbit's LDA model?

2
I updated the wiki with documentation of the Dirichlet priors and all other LDA hyperparameters.J.Schneider

2 Answers

1
votes

Check this description of vw lda.! I think the parameters mentioned on 13th slide might be the ones that you are looking for.

0
votes

Just for the sake of completeness, the LDA implementation offers the following hyperparameters:

Latent Dirichlet Allocation:
  --lda arg                             Run lda with <int> topics

  --lda_alpha arg (=0.100000001)        Prior on sparsity of per-document topic
                                        weights
  --lda_rho arg (=0.100000001)          Prior on sparsity of topic 
                                        distributions
  --lda_D arg (=10000)                  Number of documents
  --lda_epsilon arg (=0.00100000005)    Loop convergence threshold
  --minibatch arg (=1)                  Minibatch size, for LDA
  --math-mode arg (=0)                  Math mode: simd, accuracy, fast-approx
  --metrics arg (=0)                    Compute metrics

You can find the source code for implementation details here.

Or directly jump into the source code of vw utility which offers slightly different parameters.