I ran a following spark-shell exercise:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.
scala> case class Test(notNullable:String, nullable:Option[String])
defined class Test
scala> val myArray = Array(
| Test("x", None),
| Test("y", Some("z"))
| )
myArray: Array[Test] = Array(Test(x,None), Test(y,Some(z)))
scala> val rdd = sc.parallelize(myArray)
rdd: org.apache.spark.rdd.RDD[Test] = ParallelCollectionRDD[0] at parallelize at <console>:28
scala> rdd.toDF.printSchema
root
|-- notNullable: string (nullable = true)
|-- nullable: string (nullable = true)
I've read (Spark in Action) that given a case class with Option fields, those not optional should be inferred as not nullable. Is that even true? If so what am I doing wrong here?