0
votes

I have 2 columns, how I can fill one of them based on the other one?

if col2 was non_null make the col1==col2 and keep col1 everywhere:

col1     col2
null       'us'
'us'       null
'us'       null 
null       'us'
null       'us'
null       null 

output

col1     col2
'us'       'us'
'us'       null
'us'       null 
'us'       'us'
'us'       'us'
null       null 
1
and how is pyspark related to this?Alberto Sinigaglia

1 Answers

1
votes

pyspark.sql.functions.coalesce(*cols): Returns the first column that is not null.

data = spark.createDataFrame([
    (None, 'us'),
    ('us', None),
    (None, 'us'),
    (None, None),
], ['col1', 'col2'])

data.withColumn('col1', coalesce(col('col1'), col('col2'))).show(10)
# +----+----+
# |col1|col2|
# +----+----+
# |  us|  us|
# |  us|null|
# |  us|  us|
# |null|null|
# +----+----+