While exploring Spark accumulators, I tried to understand and showcase the difference between the accumulator and regular variable in Spark. But output does not seem to match my expectation. I mean both the accumulator and counter have the same value at the end of program and am able read accumulator within transformation function (as per docs only driver can read accumulator). Am i doing something wrong? Is my understanding correct?
object Accmulators extends App {
val spark = SparkSession.builder().appName("Accmulators").master("local[3]").getOrCreate()
val cntAccum = spark.sparkContext.longAccumulator("counterAccum")
val file = spark.sparkContext.textFile("resources/InputWithBlank")
var counter = 0
def countBlank(line:String):Array[String]={
val trimmed = line.trim
if(trimmed == "") {
cntAccum.add(1)
cntAccum.value //reading accumulator
counter += 1
}
return line.split(" ")
}
file.flatMap(line => countBlank(line)).collect()
println(cntAccum.value)
println(counter)
}
The input file has text with 9 empty lines in between
4) Aggregations and Joins
5) Spark SQL
6) Spark App tuning
Output :
Both counter and cntAccum giving same result.
