I am using spark stream to read data from kafka cluster. I want to sort a DStream pair and get the Top N alone. So far I have sorted using
val result = ds.reduceByKeyAndWindow((x: Double, y: Double) => x + y,
Seconds(windowInterval), Seconds(batchInterval))
result.transform(rdd => rdd.sortBy(_._2, false))
result.print
My Questions are
- How to get only the top N elements from the dstream ?
- The transform operation is applied rdd by rdd . So will the result be sorted across elements in all rdds ? If not how to achieve it ?