I have a pyspark dataframe df. it has 2 columns like the example input shown below. I would like to create a new output dataframe, with a new column 'col3' that only has the alphanumeric values from the strings in col2.
I've tried using spark sql with
regexp_extract('('+col1+')','[^[A-Za-z0-9] ]', 0)
but it only returns null.
can anyone suggest how to do this?
input
df.show()
+----+----+
|col1|col2|
+----+----+
|1 |ab& |
+----+----+
|2 |efg |
+----+----+
output
+----+----+
|col1|col3|
+----+----+
|1 |ab |
+----+----+
|2 |efg |
+----+----+