These are the values of my dateframe:
+-------+----------+
| ID| Date_Desc|
+-------+----------+
|8951354|2012-12-31|
|8951141|2012-12-31|
|8952745|2012-12-31|
|8952223|2012-12-31|
|8951608|2012-12-31|
|8950793|2012-12-31|
|8950760|2012-12-31|
|8951611|2012-12-31|
|8951802|2012-12-31|
|8950706|2012-12-31|
|8951585|2012-12-31|
|8951230|2012-12-31|
|8955530|2012-12-31|
|8950570|2012-12-31|
|8954231|2012-12-31|
|8950703|2012-12-31|
|8954418|2012-12-31|
|8951685|2012-12-31|
|8950586|2012-12-31|
|8951367|2012-12-31|
+-------+----------+
I tried to create a median value of a date column in pyspark:
df1 = df1.groupby('Date_Desc').agg(f.expr('percentile(ID, array(0.25))')[0].alias('%25'),
f.expr('percentile(ID, array(0.50))')[0].alias('%50'),
f.expr('percentile(ID, array(0.75))')[0].alias('%75'))
But I get this an error:
Py4JJavaError: An error occurred while calling o198.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 29.0 failed 1 times, most recent failure: Lost task 1.0 in stage 29.0 (TID 427, 5bddc801333f, executor driver): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '11/23/04 9:00' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.