We have a Spark Cluster running under Memsql, We have different Pipelines running, The ETL setup is as below.
- Extract:- Spark read Messages from Kafka Cluster (Using Memsql Kafka-Zookeeper)
- Transform:- We have a custom jar deployed for this step
- Load:- Data from Transform stage is Loaded in Columnstore
I have below doubts:
What Happens to the Message polled from Kafka, if the Job fails in Transform stage - Does Memsql takes care of loading that Message again - Or, the data is Lost
If the data gets Lost, how can I solve this Problem, is there any configuration changes which needs to done for this?