I'm trying to run Spark (1.3.1) Mllib k-means clustering on a dataframe of floating point numbers. I'm following the clustering example provided by Spark
https://spark.apache.org/docs/1.3.1/mllib-clustering.html
However, instead of a text file, I'm using a dataframe composed of one column of doubles (for simplicity). I need to convert this to a vector for the Kmeans function, as per Mllib docs. So far I have this
import org.apache.spark.mllib.linalg.Vectors
val parsedData = data.map(s => Vectors.dense(s(0))).cache()
and I receive the error
error: overloaded method value dense with alternatives:
(values: Array[Double])org.apache.spark.mllib.linalg.Vector and
(firstValue: Double,otherValues: Double*)org.apache.spark.mllib.linalg.Vector
cannot be applied to (Any)
val parsedData = sample2.map(s => Vectors.dense(s(1))).cache()
^
Is there a better way of doing this?
I have read this similar post, but I didn't find it similar enough: How to turn a known structured RDD to Vector and this one How to convert org.apache.spark.rdd.RDD[Array[Double]] to Array[Double] which is required by Spark MLlib which deals with text data