I am trying to convert a string column having dates in some format(YYYY-mm-dd) into a date/timestamp column. But not able to do so by using sql's str_to_date function.
I have followed this Convert date from String to Date format in Dataframes, But it works for a single value, instead i want the date timestamp conversion to be applied over all rows of a column, so i tried the below
sparkDF = spark.createDataFrame([('a', '2018-01-20'), ('b', '2018-01-21'), ('c', '2018-01-22')], ['name','enroll_DATE'])
>>> sparkDF.show()
+----+-----------+
|name|enroll_DATE|
+----+-----------+
| a| 2018-01-20|
| b| 2018-01-21|
| c| 2018-01-22|
+----+-----------+
>>> sparkDF.printSchema()
root
|-- name: string (nullable = true)
|-- enroll_DATE: string (nullable = true)
sparkDF.registerTempTable('data')
res = spark.sql("select name, enroll_DATE, TO_DATE(CAST(UNIX_TIMESTAMP(enroll_DATE, '%Y-%m-%d') as TIMESTAMP)) as converted_ED from data ")
But getting all values for the converted_ED column as null.
There is obviously better ways to handle from DataFrame operations, but i have limitation(or constraint you can say) to do this in sparkDataframe transformation way. Basically spark.sql is mandatory for me.
As per the link, if it is possible to do the same for a single date_string value, so there might be a way already exist(i am unaware of) for doing the same over a column.
Please suggest. Thanks.
Edit
By changing the format string from '%Y-%m-%d' to 'YYYY-mm-dd', the conversion worked but in a weird way(two dates are different!!!!)
res = spark.sql("SELECT name, enroll_DATE, TO_DATE(CAST(UNIX_TIMESTAMP(enroll_DATE, 'YYYY-mm-dd') AS TIMESTAMP)) AS newdate from data")
>>> res.show()
+----+-----------+----------+
|name|enroll_DATE| newdate|
+----+-----------+----------+
| a| 2018-01-20|2017-12-31| ### two dates are different
| b| 2018-01-21|2017-12-31|
| c| 2018-01-22|2017-12-31|
+----+-----------+----------+
>>> res.printSchema()
root
|-- name: string (nullable = true)
|-- enroll_DATE: string (nullable = true)
|-- newdate: date (nullable = true)
But still looking for doing the same for '%Y-%d-%m' format too(if there is any solution.).