I have a dataframe with column "EVENT_ID" whose datatype is String. I am running FPGrowth algorithm but throws the below error
Py4JJavaError: An error occurred while calling o1711.fit.
:java.lang.IllegalArgumentException: requirement failed:
The input column must be array, but got string.
The column EVENT_ID has values
E_34503_Probe
E_35203_In
E_31901_Cbc
I am using the below code to convert the string column to arraytype
df2 = df.withColumn("EVENT_ID", df["EVENT_ID"].cast(types.ArrayType(types.StringType())))
But I get the following error
Py4JJavaError: An error occurred while calling o1874.withColumn.
: org.apache.spark.sql.AnalysisException: cannot resolve '`EVENT_ID`' due to data type mismatch: cannot cast string to array<string>;;
How do I either cast this column to array type or run the FPGrowth algorithm with string type?
pyspark.sql.functions.array
: for example:df2 = df.withColumn("EVENT_ID", array(df["EVENT_ID"]))
– pault