2
votes
dictionary - Scala Spark - empty map on DataFrame column for map(String, Int) - Stack Overflow
Asked
Viewed 4k times
2

I am joining two DataFrames, where there are columns of a type Map[String, Int]

I want the merged DF to have an empty map [] and not null on the Map type columns.

val df = dfmerged.
  .select("id"),
          coalesce(col("map_1"), lit(null).cast(MapType(StringType, IntType))).alias("map_1"),
          coalesce(col("map_2"), lit(Map.empty[String, Int])).alias("map_2")

for a map_1 column, a null will be inserted, but I'd like to have an empty map map_2 is giving me an error:

java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Map$EmptyMap$ Map()

I've also tried with an udf function like:

case class myStructMap(x:Map[String, Int])
val emptyMap = udf(() => myStructMap(Map.empty[String, Int]))

also did not work.

when I try something like:

.select( coalesce(col("myMapCol"), lit(map())).alias("brand_viewed_count")...

or

.select(coalesce(col("myMapCol"), lit(map().cast(MapType(LongType, LongType)))).alias("brand_viewed_count")...

I get the error:

cannot resolve 'map()' due to data type mismatch: cannot cast MapType(NullType,NullType,false) to MapType(LongType,IntType,true);

0
    6

    In Spark 2.2

    import org.apache.spark.sql.functions.typedLit
    
    val df = Seq((1L, null), (2L, Map("foo" -> "bar"))).toDF("id", "map")
    
    df.withColumn("map", coalesce($"map", typedLit(Map[String, Int]()))).show
    // +---+-----------------+
    // | id|              map|
    // +---+-----------------+
    // |  1|            Map()|
    // |  2|Map(foobar -> 42)|
    // +---+-----------------+
    

    Before

    df.withColumn("map", coalesce($"map", map().cast("map<string,int>"))).show
    // +---+-----------------+
    // | id|              map|
    // +---+-----------------+
    // |  1|            Map()|
    // |  2|Map(foobar -> 42)|
    // +---+-----------------+
    

      Your Answer

      By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

      Not the answer you're looking for? Browse other questions tagged or ask your own question.

       
      1

      1 Answers

      6
      votes

      In Spark 2.2

      import org.apache.spark.sql.functions.typedLit
      
      val df = Seq((1L, null), (2L, Map("foo" -> "bar"))).toDF("id", "map")
      
      df.withColumn("map", coalesce($"map", typedLit(Map[String, Int]()))).show
      // +---+-----------------+
      // | id|              map|
      // +---+-----------------+
      // |  1|            Map()|
      // |  2|Map(foobar -> 42)|
      // +---+-----------------+
      

      Before

      df.withColumn("map", coalesce($"map", map().cast("map<string,int>"))).show
      // +---+-----------------+
      // | id|              map|
      // +---+-----------------+
      // |  1|            Map()|
      // |  2|Map(foobar -> 42)|
      // +---+-----------------+