0
votes

Hi I am using Spark ML to train a model. The training dataset has 130 columns and 10 million rows. Now, the problem is that whenever I run MultiLayerPerceptron it shows the following error:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in stage 1882.0 failed 4 times, most recent failure: Lost task 43.3 in stage 1882.0 (TID 180174, 10.233.252.145, executor 6): java.lang.ArrayIndexOutOfBoundsException

Interestingly it does not happen when I used other classifiers such as Logistic Regression and Random Forest.

My Code:

# Building the model

inputneurons = len(features_columns)

#Assembling the Feature Vectors
assembler = VectorAssembler(inputCols=features_columns, outputCol="features")

#Logistic Regression
mlp = MultilayerPerceptronClassifier(labelCol=label, featuresCol="features", layers=[inputneurons,300,2])

#Pipelining the assembling and modeling process
pipeline = Pipeline(stages=[assembler, mlp])
model = pipeline.fit(training_df)

What can be a reason behind such issue with MLP in Spark?

1

1 Answers

0
votes

There were more than two classes in the label but in the multilayer perceptron classifier I have specified 2 output neurons which resulted in ArrayIndexOutOfBoundException.