Pyspark : Enter current date (Epoch) whereever there is a null in pyspark column

Question

I have a pyspark dataframe df1

id      account        created_date
1       A-111          1487384387
2       B-222          
3       C-333
4       D-444          1372873827

I want to populate the current system timestamp (epoch) where ever created_date is null. I tried below

current_date = unix_timestamp(current_timestamp()) * 1000
df1 = df1.na.fill({'created_date': current_date})

but getting the error the column is not iterable. How can I achieve this

Cena Cena · Accepted Answer · 2020-11-08T01:28:21

Use cast("long") to convert current_timestamp() to epoch timestamp.
The coalesce function can be used to replace nulls.

from pyspark.sql.functions import current_timestamp, coalesce

df.withColumn('created_date', coalesce('created_date', 
        current_timestamp().cast("long"))).show()

+---+-------+------------+                                                      
| id|account|created_date|
+---+-------+------------+
|  1|  A-111|  1487384387|
|  2|  B-222|  1604798619|
|  3|  C-333|  1604798619|
|  4|  D-444|  1372873827|
+---+-------+------------+

Pyspark : Enter current date (Epoch) whereever there is a null in pyspark column

1 Answers