0
votes

I'have a dataframe "df1" which looks like below.

+----------+----------+ 
| col1     |      col2| 
+----------+----------+ 
|  11111111|123456789 | 
|  11111111|          | 
|  11111111|123456789 | 
|  11111111|          | 
+-----+----------+----+  

I want to filter datafarme by checking if the col2 is spaces.

My command in scala spark-shell is val df3 = crpsdfs.filter($"GASP_NATID01_CD" != "")

But the resultant dataframe has rows with spaces too

Expected result is : +----------+----------+ | col1 | col2| +----------+----------+ | 11111111|123456789 |
| 11111111|123456789 |
+-----+----------+----+

could you please help.

1

1 Answers

0
votes

For your example, try

df1.filter("cast(col2 as int) > 0")

Generally, there may not be a simple condition to filter out spaces. You may try

spark.sqlContext.createDataFrame( 
  df1.rdd.filter{case Row(col1, col2) => (col2.asInstanceOf[String].trim != "")}, 
  df1.schema)