We're using Spark Streaming connected to AWS Kinesis stream in order to aggregate (per minute) the metrics that we're receiving and writing the aggregations to influxdb in order to make them available to a real-time dashboard.
Everything is working fine, but we're now considering how should we manage the pauses for deployments and eventual failures of the system.
The docs says that Kinesis integration library is already prepared for failures, checkpoint and so on, but I would like to clarify how checkpoint is working there.
The Kinesis receiver creates an input DStream using the Kinesis Client Library (KCL) provided by Amazon under the Amazon Software License (ASL). The KCL builds on top of the Apache 2.0 licensed AWS Java SDK and provides load-balancing, fault-tolerance, checkpointing through the concepts of Workers, Checkpoints, and Shard Leases.
We can define the checkpoint interval for kinesis, but as far as I understand that is just used to mark until which point of the stream we've consumed the metrics. So, we still need to use the checkpointing feature from spark streaming, right?
As we're aggregating the data per minute, our batch interval is 60 seconds but during those 60 seconds we're continuously receiving data from the stream.
Here are my questions:
- When I execute JavaStreamingContext.stop(...) (in order to deploy a new version of the job), the receiver will be stopped and the checkpoint will be updated at the end?
- When will the spark streaming checkpoint happen? After every execution of the job? Before?
- Assuming that we've both checkpoints working, how can we guarantee the consistency in case of failure? It seems that every time that streaming checkpointing is happening it needs to checkpoint to kinesis at the same time, otherwise we can end reading the same data again. How can we handle this?
- If the underlying service (in this case influxdb) is down, what should I do? Implement a retry mechanism? If so, it needs to stop retrying after a while, otherwise we'll run Out of Memory.
Thanks in advance!