I want to run this code in pyspark (spark 2.1.1):
from pyspark.ml.feature import PCA
bankPCA = PCA(k=3, inputCol="features", outputCol="pcaFeatures")
pcaModel = bankPCA.fit(bankDf)
pcaResult = pcaModel.transform(bankDF).select("label", "pcaFeatures")
pcaResult.show(truncate= false)
But I get this error:
requirement failed: Column features must be of type
org.apache.spark.ml.linalg.Vect orUDT@3bfc3ba7but was actuallyorg.apache.spark.mllib.linalg.VectorUDT@f71b0bce.