1
votes

I'm using Spark 1.3.0 (Scala 2.10.X) MLlib LDA algorithm with Spark Java API. I have the following issue when I try to read the document-topic distribution from LDA model during runtime.

"main" java.lang.ClassCastException: [Lscala.Tuple2; cannot be cast to scala.Tuple2

I have given the relevant code below:

DistributedLDAModel ldaModel = new LDA().setK(3).run(corpus);
RDD<Tuple2<Object, Vector>> topicDist = ldaModel.topicDistributions();

How do I read or display the content (documents and their topic distribution) in "topicDist" in JavaRDD?

1
Seems like the required type is Tuple2[] array, but you give a standard type.Mordechai
@MouseEvent: the other way around, actually :).mikołak

1 Answers

0
votes

I found the solution and I have given it below:

JavaRDD<Tuple2<Object, Vector>> topicDist = ldaModel.topicDistributions().toJavaRDD();

List<Tuple2<Object, Vector>> list = topicDist.collect();