I have a dataframe whose the schema is as follows:
root
|-- key: string (nullable = true)
|-- value: array (nullable = true)
| |-- element: string (containsNull = true)
I want to remove the first whitespace (if exists) in each element of the array in the value column from pyspark.sql.functions import regexp_replace I think it will be like the code below:
df.select(regexp_replace(col("values"), \s*, *)).show()
The df:
+---+------------------------+
|key| value|
+---+------------------------+
| k1| [ x1 x2, x3, x4]|
| k2| [x5, x6 x7, x8]|
| k3|[ x9 x10, x11, x12 x13]|
+---+------------------------+
Expected result:
+---+------------------------+
|key| value|
+---+------------------------+
| k1| [x1 x2, x3, x4]|
| k2| [x5, x6 x7, x8]|
| k3| [x9 x10, x11, x12 x13]|
+---+------------------------+
(All the whitespaces before the arrays' elements must be eliminated) Thank you