I have the following model that I would like to estimate using SparkML MultilayerPerceptronClassifier()
.
val formula = new RFormula()
.setFormula("vtplus15predict~ vhisttplus15 + vhistt + vt + vtminus15 + Time + Length + Day")
.setFeaturesCol("features")
.setLabelCol("label")
formula.fit(data).transform(data)
Note: The features is a vector and label is a Double
root
|-- features: vector (nullable = true)
|-- label: double (nullable = false)
I define my MLP estimator as follows:
val layers = Array[Int](6, 5, 8, 1) //I suspect this is where it went wrong
val mlp = new MultilayerPerceptronClassifier()
.setLayers(layers)
.setBlockSize(128)
.setSeed(1234L)
.setMaxIter(100)
// train the model
val model = mlp.fit(train)
Unfortunately, I got the following error:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 3, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: 11 at org.apache.spark.ml.classification.LabelConverter$.encodeLabeledPoint(MultilayerPerceptronClassifier.scala:121) at org.apache.spark.ml.classification.MultilayerPerceptronClassifier$$anonfun$3.apply(MultilayerPerceptronClassifier.scala:245) at org.apache.spark.ml.classification.MultilayerPerceptronClassifier$$anonfun$3.apply(MultilayerPerceptronClassifier.scala:245) at scala.collection.Iterator$$anon$11.next(Iterator.scala:363) at scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:935) at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:950) ...