I try to apply StringIndexer()
on multiple columns, I work with Scala and Spark 2.3.
This is my code:
val df1 = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("file:///c:/tmp/spark-warehouse/train.csv")
val feat = df1.columns.filterNot(_ .contains("BsmtFinSF1"))
val inds = feat.map { colName =>
val indexer1 = new StringIndexer()
.setInputCol(colName)
.setOutputCol(colName + "I")
.fit(df1)
Array(indexer1)
}
val pipeline = new Pipeline().setStages(inds.toArray)
But, I have this error:
Error:(134, 50) type mismatch;
found : Array[Array[org.apache.spark.ml.feature.StringIndexerModel]]
required: Array[? <: org.apache.spark.ml.PipelineStage]Note: Array[org.apache.spark.ml.feature.StringIndexerModel] >: ? <: org.apache.spark.ml.PipelineStage, but class Array is invariant in type T. You may wish to investigate a wildcard type such as
_ >: ? <: org.apache.spark.ml.PipelineStage
. (SLS 3.2.10)
val pipeline = new Pipeline().setStages(inds.toArray)
Any help will be appreciated. thank you