1
votes

Lets say we have a spark streaming job that runs every 5 sec. Here we have a foreachRDD statement inside which we broadcast a variable. Question is will the broadcast variables be broadcasted every time for each RDD even though it has not changed?

Second based on some condition say after 1 hour, if I update this broadcast variable (meaning the reference data structure, BC variable is pointing to) using unpersist() and then re-broadcast this BC variable, will this also be braodcasted only once to all workers or multiple times i.e. once in each foreachRDD loop?

1

1 Answers

1
votes

If you create broadcast variable inside foreachRDD call:

stream.foreachRDD(rdd => {
  val broadcast = ???
  ...
})

this variable is created and transfered for each batch. It doesn't matter if wrapped variable changed or not.

You should also remember that broadcast variables are not fully reliable when used in Spark Streaming (cannot be recovered from checkpoint) and in general broadcasted variable shouldn't be modified.