1
votes

I am training and saving a XGBoost model as shown below:

XGBoost Version 0.82

Spark Version 2.4.2

Get Model (invokes train function)

def getModel(trainingData: DataFrame): PipelineModel = {
    val pipelineModel = train(trainingData)

    if (modelPathToSave != "") {
      pipelineModel.write.overwrite().save(modelPathToSave)
      println(f"Saved model to $modelPathToSave")
    }
    pipelineModel
    }

Train Model

def train(trainingData: DataFrame): PipelineModel = {
    val nh = new NullHandler()
      .setCols(hackyEncode(featureList))
      .setMethod("fill")

    val va = new VectorAssembler()
      .setInputCols(hackyDecode(nh.getCols).toArray)
      .setOutputCol(featuresCol)

    val xgb = new XGBoostClassifier()
      .setLabelCol("label")
      .setFeaturesCol("features")
      .setEta(0.3)
      .setMaxDepth(8)
      .setObjective("binary:logistic")
      .setEvalMetric("auc")  
      .setScalePosWeight(9)

    val pipeline = new Pipeline()
        .setStages(Array[PipelineStage](nh, va, xgb))

    pipeline.fit(trainingData)
}

However i got this error:

Exception in thread "main" java.lang.NoSuchMethodError: shaded.json4s.jackson.JsonMethods$.parse(Lshaded/json4s/JsonInput;Z)Lshaded/json4s/JsonAST$JValue;
    at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1$$anonfun$3.apply(DefaultXGBoostParamsWriter.scala:73)
    at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1$$anonfun$3.apply(DefaultXGBoostParamsWriter.scala:71)

Despite having json4s in my build.sbt file.

  "org.json4s" %% "json4s-native" % "3.5.1",
  "org.json4s" %% "json4s-jackson" % "3.6.6", 

Anyone can help please?

1

1 Answers

1
votes

Xgboost version 0.82 is not compatible with Spark 2.4. You can either downgrade to Spark 2.3 or use Xgboost version 0.90.

Reference:

https://discuss.xgboost.ai/t/xgboost-0-8-2-and-spark-2-4-0-unable-to-save-pipeline-model-into-aws-s3/838