I'm work with Mllib of Spark, and now is doing something with LDA.
But when I use the code provided by Spark(see bellow) to predict a Doc used in training the model, the result(document-topics) of predict is at opposite poles with the result of trained document-topics.
I don't know what caused the result.
Asking for help, and here is my code below:
train:$lda.run(corpus)
the corpus is an RDD like this: $RDD[(Long, Vector)]
the Vector contains vocabulary, index of words, wordcounts.
predict:
def predict(documents: RDD[(Long, Vector)], ldaModel: LDAModel): Array[(Long, Vector)] = {
var docTopicsWeight = new Array[(Long, Vector)](documents.collect().length)
ldaModel match {
case localModel: LocalLDAModel =>
docTopicsWeight = localModel.topicDistributions(documents).collect()
case distModel: DistributedLDAModel =>
docTopicsWeight = distModel.toLocal.topicDistributions(documents).collect()
}
docTopicsWeight
}
ldaModel
is aDistributedLDAModel
. Is that correct? – Jason Lenderman