Following the answer to this question How to convert type Row into Vector to feed to the KMeans
I have created the feature table for my data.(assembler is a Vector Assembler)
val kmeanInput = assembler.transform(table1).select("features")
When I run k-means with kmeanInput
val clusters = KMeans.train(kmeanInput, numCluster, numIteration)
I get the error
:102: error: type mismatch; found : org.apache.spark.sql.DataFrame (which expands to) org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] val clusters = KMeans.train(kmeanInput, numCluster, numIteration)
As @Jed mentioned in his answer, this happens because rows are not in Vectors.dense format.
To solve this I tried
val dat = kmeanInput.rdd.map(lambda row: Vectors.dense([x for x in
row["features"]]))
And I get this error
:3: error: ')' expected but '(' found. val dat = kmeanInput.rdd.map(lambda row: Vectors.dense([x for x in row["features"]]))
:3: error: ';' expected but ')' found. val dat = kmeanInput.rdd.map(lambda row: Vectors.dense([x for x in row["features"]]))