I have a pyspark dataframe and would like to apply an UDF on a column with Null values.
Below is my dataframe:
+----+----+
| a| b|
+----+----+
|null| 00|
|.Abc|null|
|/5ee| 11|
|null| 0|
+----+----+
Below is the desired dataframe (remove punctuations and change string values to upper case in column a if row values are not Null):
+----+----+
| a| b|
+----+----+
|null| 00|
| ABC|null|
| 5EE| 11|
|null| 0|
+----+----+
Below is my UDF and code:
import pyspark.sql.functions as F
import re
remove_punct = F.udf(lambda x: re.sub('[^\w\s]', '', x))
df = df.withColumn('a', F.when(F.col("a").isNotNull(), F.upper(remove_punct(F.col("a")))))
Below is the error:
TypeError: expected string or bytes-like object
Can you please suggest what would be the optimal solution the get the desired DF?
Thanks in advance!