Case insensitive search in array type column spark dataframe

Question

I have spark data frame like following:

 +----------+-------------------------------------------------+  
 |col1      |words                                            |  
 +----------+-------------------------------------------------+  
 |An        |[An, attractive, ,, thin, low, profile]          |  
 |attractive|[An, attractive, ,, thin, low, profile]          |  
 |,         |[An, attractive, ,, thin, low, profile]          |  
 |thin      |[An, attractive, ,, thin, low, profile]          |    
 |rail      |[An, attractive, ,, thin, low, profile]          |  
 |profile   |[An, attractive, ,, thin, low, profile]          |  
 |Lighter   |[Lighter, than, metal, ,, Level, ,, and, tes]    |  
 |than      |[Lighter, than, metal, ,, Level, ,, and, tww]    |  
 |steel     |[Lighter, than, metal, ,, Level, ,, and, test]   |  
 |,         |[Lighter, than, metal, ,, Level, ,, and, Test]   |  
 |Level     |[Lighter, than, metal, ,, Level, ,, and, test]   |  
 |,         |[Lighter, than, metal, ,, Level, ,, and, ste]    |  
 |and       |[Lighter, than, metal, ,, Level, ,, and, ste]    |  
 |Test      |[Lighter, than, metal, ,, Level, ,, and, Ste]    |  
 |Renewable |[Renewable, resource]                            |  
 |Resource  |[Renewable, resource]                            |  
 |No        |[No1, Bal, testme, saves, time, and, money]      |  
 +----------+-------------------------------------------------+

I want to filter the data from the above column as case insensitive. Currently I am doing like this.

df.filter(array('words, "level")).show(false)

but it is not showing any data. please help me to resolve the issue.

Care to elaborate on "I want to filter the data from the above column as case insensitive. "? Do you want to search words in words column (that seems to be of array type)? Why not to use col1 instead since it's already available? — Jacek Laskowski
Sure @JacekLaskowski I will try to do that, and I wanted to select all the rows available with data which is available in array type column. — Narendra Mohan Prasad

koiralo koiralo · Accepted Answer · 2017-07-25T17:20:58

For this you can create a simple udf that converts both the case to lower case and filters

Here is the simple example,

scala> import spark.implicits._
import spark.implicits._

scala> import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions._

scala> val df = Seq(("An", List("An", "attractive"," ","", "thin", "low", "profile")), ("Lighter", List("Lighter", "than", "metal"," " ,"", "Level"," " ,"", "and", "tes"))).toDF("col1", "words")
df: org.apache.spark.sql.DataFrame = [col1: string, words: array<string>]

scala> val filterUdf = udf((arr: Seq[String]) => arr.map(_.toLowerCase).contains("level".toLowerCase))
filterUdf: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,BooleanType,Some(List(ArrayType(StringType,true))))

scala> df.filter(filterUdf($"words")).show(false)

+-------+-------------------------------------------------+
|col1   |words                                            |
+-------+-------------------------------------------------+
|Lighter|[Lighter, than, metal,  , , Level,  , , and, tes]|
+-------+-------------------------------------------------+

Hope this helps!

Case insensitive search in array type column spark dataframe

2 Answers