I have been using pyspark 2.3. I have data frame containing 'TIME' column in String format for DateTime values. where the column looks like:
+---------------+
| TIME|
+---------------+
| 2016/04/14 190|
| 2016/04/15 180|
|2016/04/14 1530|
|2016/04/16 1530|
| 2016/04/17 160|
+---------------+
where the first two digits in 190
or 1530
representing the hour and the rest is the minutes.
I tried to convert it to a timestamp type using the following lines:
df.withColumn('TIME_timestamp',fn.unix_timestamp('TIME','yyyy/MM/dd HHMM').cast(TimestampType()))
and also :
df.withColumn('TIME_timestamp', fn.to_timestamp("TIME", 'yyyy/MM/dd HHMM'))
but the resulting is:
+---------------+-------------------+
| TIME| TIME_timestamp|
+---------------+-------------------+
| 2016/04/14 190| null|
| 2016/04/15 180| null|
|2016/04/14 1530| null|
|2016/04/16 1530| null|
| 2016/04/17 160| null|
+---------------+-------------------+
So the desired Dataframe should look like:
+---------------+
| TIME_timestamp|
+---------------+
| 16-04-15 19:00|
| 16-04-15 18:00|
| 16-04-15 15:30|
| 16-04-15 15:30|
| 16-04-15 16:00|
+---------------+