I have written the following code in order to convert SQL DataFrame df to RDD[LabeledPoint]:
val targetInd = df.columns.indexOf("myTarget")
val ignored = List("myTarget")
val featInd = df.columns.diff(ignored).map(df.columns.indexOf(_))
df.printSchema
val dfLP = df.rdd.map(r => LabeledPoint(
r.getDouble(targetInd),
Vectors.dense(featInd.map(r.getDouble(_)).toArray)
))
The schema looks like this:
root
|-- myTarget: long (nullable = true)
|-- var1: long (nullable = true)
|-- var2: double (nullable = true)
When I run dfLP.foreach(l => l.label), then the following error occurs:
java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
How can I cast the label to double? I expect that other features might be both double or long, isn't it? If it's not true, then I will also need to cast the rest of features to double.
r.getLong(targetInd).toDoubleinside the map ordf.withColumn("myTarget", df("myTarget").cast("double"))before it. Note that it should be done for each Long column. - Daniel de Paulaval dfLP = df.rdd.map(r => LabeledPoint( r.getDouble(targetInd).toDouble, Vectors.dense(featInd.map(r.getDouble(_).toDouble).toArray) )), but still the same error when I dodfLP.foreach(l => l.label)- duckertitofeatInd)?df=df.withColumn("myTarget", df("myTarget").cast("double"))- duckertito