0
votes

Can anyone suggest how can I add multiple empty columns in a pyspark dataframe. Currently I am doing something like this but its not working :

def add_columns(dataframe, column_list):
    for col in column_list:
        self = dataframe.withColumn(str(col), lit(None).cast(StringType())))
    return dataframe

In the output schema after the add_columns function is applied , I get new column as generator object genexpr at 0x7f41189d7e10: string (nullable = true)

1
Try replacing lit(None) with lit('')?ags29
what is column_list? what values are there? str(col) what do you expect here? maybe it should be col.name instead.vvg

1 Answers

0
votes

Your code snippet is working for me, just make this small change inside:

def add_columns(dataframe, column_list):
    self = dataframe.withColumn(str(column_list[0]), f.lit(None).cast(StringType()))
    for col in column_list[1:]:
        self = self.withColumn(str(col), f.lit(None).cast(StringType()))
    return self

I returned "self" instead of "dataframe" to not adding multiple columns to dataframe every time the function is run.