0
votes

I have a PySpark dataframe which has a complex column, refer to below value:

ID  value
1   [{"label":"animal","value":"cat"},{"label":null,"value":"George"}]

I want to add a new column in PySpark dataframe which basically convert it into a list of strings. If Label is null, string should contain "value" and if label is not null, string should be "label:value". So for above example dataframe, the output should look like below:

ID   new_column
 1   ["animal:cat", "George"]
1

1 Answers

0
votes

You can use transform to transform each array element into a string, which is constructed using concat_ws:

df2 = df.selectExpr(
    'id',
    "transform(value, x -> concat_ws(':', x['label'], x['value'])) as new_column"
)

df2.show()
+---+--------------------+
| id|          new_column|
+---+--------------------+
|  1|[animal:cat, George]|
+---+--------------------+