Relatively new to scala and the Spark API kit but I have a question trying to make use of the vector assembler
http://spark.apache.org/docs/latest/ml-features.html#vectorassembler
to then make use of matrix correlations
https://spark.apache.org/docs/2.1.0/mllib-statistics.html#correlations
The dataframe column is of dtype linalg.Vector
val assembler = new VectorAssembler()
val trainwlabels3 = assembler.transform(trainwlabels2)
trainwlabels3.dtypes(0)
res90: (String, String) = (features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7)
and yet calling this to an RDD for the statistics tool throws a mismatch error.
val data: RDD[Vector] = sc.parallelize(
trainwlabels3("features")
)
<console>:80: error: type mismatch;
found : org.apache.spark.sql.Column
required: Seq[org.apache.spark.mllib.linalg.Vector]
Thanks in advance for any help.