0
votes

Using spark 2.4.1, I'm trying to get a key value from a MapType in a case insensitive fashion but spark does not seems to follow spark.sql.caseSensitive=false.

Starting spark with: spark-shell --conf spark.sql.caseSensitive=false

Given dataframe:

val df = List(Map("a" -> 1), Map("A" -> 2)).toDF("m")
+--------+
|       m|
+--------+
|[a -> 1]|
|[A -> 2]|
+--------+

And executing any of these will only return one row. (case sensitive match in the keys of the map but case insensitive in the name of the column)

df.filter($"M.A".isNotNull).count
df.filter($"M"("A").isNotNull).count
df.filter($"M".getField("A").isNotNull).count

Is there a way to get the field resolution to be case insensitive when resolving a key in a map?

Update: I dug into spark code to find that it's probably a bug/feature. It looks like it calls GetMapValue (complexTypeExtractors.scala) with simple StringType ordering instead of using the case insensitive Resolver as it does in GetStructField.

I filled a JIRA for that: SPARK-27820

1

1 Answers

0
votes

Not exactly pretty but should do the trick:

import org.apache.spark.sql.functions._

df.select(
  // Re-create the map
  map_from_arrays(
    // Convert keys to uppercase
    expr("transform(map_keys(m), x -> upper(x))"),
    // Values
    map_values($"m")
  )("A".toUpperCase)
)