2
votes

I am reading data from kafka in spark streaming application and doing two actions

  1. Insert dstreams into hbase table A
  2. Update another hbase table B

I want to make sure that for each rdd in dstream Insert into hbase table A will happen before update operation on hbase table B (above two action happen sequentially for each rdd)

How to achieve this in spark streaming application

2

2 Answers

2
votes

As per my knowledge you can perform the above task in the below way

This will be performed in sequential manner

 recordStream.foreachRDD{rdd => { //this will be Dstream RDD Records from kafka
 val record = rdd.map(line => line.split("\\|")).collect 
 record.foreach {recordRDD => { //Write the code for Insert in hbase}
 record.foreach {recordRDD => { //Write the code for Update in hbase}

Hope this Helps

0
votes

Update both tables sequentially in single rdd.foreach(). It will be executed in sequential manner given you have handled exceptions properly.

This behavior is backed by the fact that its DAG will be executed in the same stage sequentially.