How do I iterate over a DataSet in Spark 2.0 and scala? My problem is - I need to compare two rows. I need to compare DateN and DateN-1 and calculate the difference.
Row1 - Date1 Num1
Row2 - Date2 Num2
..
RowN- DateN NumN
How do I iterate over a DataSet in Spark 2.0 and scala? My problem is - I need to compare two rows. I need to compare DateN and DateN-1 and calculate the difference.
Row1 - Date1 Num1
Row2 - Date2 Num2
..
RowN- DateN NumN
Not sure, whether you resolved issue using window function as you just want to compare n & n-1 rows and I dont see attribute on which you want to group the data. For your described requirement, you can resolve issue as follows:
Following is the working example :
val spark = SparkSession
.builder
.appName("Example")
.master("local[*]")
.getOrCreate()
import spark.implicits._
val customers = spark.sparkContext.parallelize(List(("Alice", "2016-05-01", 50.00),
("Alice", "2016-05-03", 45.00),
("Alice", "2016-05-04", 55.00),
("Bob", "2016-05-01", 25.00),
("Bob", "2016-05-04", 29.00),
("Bob", "2016-05-06", 27.00)))
val custIndexed = customers.zipWithIndex().collect()
val custOdd = custIndexed.filter(record=>record._2%2!=0)
val custEven = custIndexed.filter(record=>record._2%2==0)