I would like to remove blank spaces from all values in a specific column(purch_location). I am using a spark table rather then a dataframe or SQL table (but I can use dataframe or SQL table if need be).
Here is how my spark table was generated:
dataset = spark.table("trans")
cols = dataset.columns
dataset.show(5)
+----------+-------------+-------+--------------------+--------------+---+---+
|purch_date| purch_class|tot_amt| serv-provider|purch_location| id| y|
+----------+-------------+-------+--------------------+--------------+---+---+
|06/11/2017| Printing| -0.66| CARDS AND POCKETS| | 0| 0|
|03/11/2017|Uncategorized| -17.53| HOVER | | 1| 0|
|02/11/2017| Groceries| -70.05|1774 MAC'S CONVEN...| BRAMPTON | 2| 1|
|31/10/2017|Gasoline/Fuel| -20.0| ESSO | | 3| 0|
|31/10/2017| Travel| -9.0|TORONTO PARKING A...| TORONTO | 4| 0|
+----------+-------------+-------+--------------------+--------------+---+---+
I have attempted the following function pyspark.sql.functions import regexp_replace
from pyspark.sql.functions import regexp_replace
dataset1=dataset.select(regexp_replace(col("purch_location"),"\\s+",""))
Which removes the blank spaces AFTER the value in the column but not before. It also gets rid of the rest of my columns which I would like to keep.
+-------------------------------------+
|regexp_replace(purch_location, \s+, )|
+-------------------------------------+
| |
| |
| BRAMPTON|
| |
| TORONTO|
| |
| BRAMPTON|
| BRAMPTON|
| |
| null|
| MISSISSAUGA|
| |
| BRAMPTON|
Any idea how I can resolve this? Thanks in advance.