2
votes

I am new to Scala and having difficulty figuring out the issue with this code.

x.map{case (x1: Any, x2: Any,x3: String) => x1}.count()

Throwing

scala.MatchError: null error

this is the definition of x

scala> x.cache()
res111: x.type = MapPartitionsRDD[522] at map at <console>:49

scala> x
res109: org.apache.spark.rdd.RDD[(Any, Any, String)] = MapPartitionsRDD[522] at map at <console>:49

scala> x.count()
res112: Long = 64508825

Any pointers will be appreciated.

1
That means you have null in your data. Why not just x.map(_._1).count ? - philantrovert
@philantrovert it will give nullpointer exception - Ramesh Maharjan
@RameshMaharjan Nope, it shouldn't. - philantrovert
I just tried it ;) val x = sc.parallelize(Seq(("3", 1, "t"), (3.0, "1", "t"), null)) which also gives the same error that the OP has encountered - Ramesh Maharjan

1 Answers

1
votes

The error message

scala.MatchError: null

clearly indicates that there is a null value intead of (Any, Any, String)

So you should be filtering null values before the count

x.filter(_ != null).map{case (x1: Any, x2: Any,x3: String) => x1}.count()


unsure of null

If you are not sure if your data has a null value or not then you can change the match case as following and do the filter after match case

x.map{_ match {
  case (x1: Any, x2: Any,x3: String) => x1
  case _ => "not matched"
}}.filter(_ != "not matched").count()