I did that with the udf like this:
from pyspark.sql.functions import udf
def trim(string):
return string.strip()
trim=udf(trim)
df = sqlContext.createDataFrame([(' 2015-04-08 ',' 2015-05-10 ')], ['d1', 'd2'])
df2 = df.select(trim(df['d1']).alias('d1'),trim(df['d2']).alias('d2'))
output looks like this:
df.show()
df2.show()
+------------+------------+
| d1| d2|
+------------+------------+
| 2015-04-08 | 2015-05-10 |
+------------+------------+
+----------+----------+
| d1| d2|
+----------+----------+
|2015-04-08|2015-05-10|
+----------+----------+