0
votes

I am trying to compare a a column with date data type to another value however I am getting an error.

d2.printSchema()

root
 |-- family: string (nullable = true)
 |-- entry_date: array (nullable = true)
 |    |-- element: date (containsNull = true)

Following line give error

df3 = df2.withColumn("_entry_date", when(df2["entry_date"] ==lit("1979-01-01"), None))

**Error **

"cannot resolve '(entry_date = '1979-01-01')' due to data type mismatch: differing types in '(entry_date = '1979-01-01')' (array and string).;;\n'Project [family#1149, entry_date#1164, CASE WHEN (entry_date#1164 = 1979-01-01) THEN null END AS _entry_date#1167]\n+- AnalysisBarrier\n +- Aggregate [family#1149], [family#1149, collect_list(CASE WHEN isnull(_date#1154) THEN 1979-01-01 ELSE cast(_date#1154 as string) END, 0, 0) AS entry_date#1164]\n +- Project [id#1148, family#1149, date#1150, to_date(from_unixtime(unix_timestamp('date, yyyy-mm-dd, None), yyyy-MM-dd HH:mm:ss, None), None) AS _date#1154]\n +- LogicalRDD [id#1148, family#1149, date#1150], false\n"

1
what is the schema of df2 or how you transform df to df2? Is df2.entry_date an ArrayType? what's your spark version?jxc
@jxc I have updated the questionGaurang Shah
what is your expected result? to filter out some dates from this array field or reset the value to NaT when it is '1979-01-01'? can you add some example data and the expected result?jxc

1 Answers

0
votes

This worked for me :

df3 = df2.withColumn("_entry_date", when(df2["entry_date.element"] == lit("1979-01-01"), None))