2
votes

I have 2 RDD[Int] sources and noSourcesVertex. I would like to compute a map that would create 2 new RDD.

  val sourcesFormatted = sources.map(x => (Some(x), (Some(x), Some(x))))
  val noSourcesVertexFormatted = noSourcesVertex.map(x => (Some(x), (Some(x), None)))
  val outInit = sourcesFormatted.union(noSourcesVertexFormatted)

But when I'm executing the precedent code, I have an error :

error: type mismatch; found : org.apache.spark.rdd.RDD[(Some[Int], (Some[Int], None.type))] required: org.apache.spark.rdd.RDD[(Some[Int], (Some[Int], Some[Int]))] val outInit = sourcesFormatted.union(noSourcesVertexFormatted)

I think this error happens because I'm trying to join 2 RDD whose 3rd column has different type.

I wasn't expecting this behaviour because of what I anderstood of the Option's mecanism, Some(something) and None has the same type -> Option.

Why do I have this error though ?

2

2 Answers

5
votes

RDDs are invariant so you have to be specific about the types:

val sourcesFormatted: RDD[(Option[Int] (Option[Int], Option[Int]))] = 
  sources.map(x => (Some(x), (Some(x), Some(x))))
val noSourcesVertexFormatted: RDD[(Option[Int] (Option[Int], Option[Int]))] = 
  noSourcesVertex.map(x => (Some(x), (Some(x), None)))

or

val noSourcesVertexFormatted = 
  noSourcesVertex.map(x => (Some(x), (Some(x), None: Option[Int])))
3
votes

Some and None are both children of Option and not vice versa.

Option(something) 

will return

Some(something)

But

Option(null) 

will return

None

Whereas

Some(null)

will not return

None 

Some is a case class extending Option which says that it is not empty and there is a value
None is also a case class extending Option which says it is empty and trying to get the value should throw NoSuchElementException
And
Option is an object storing a not null value as Some and null value as None