0
votes

I am pulling data stream from RabbitMQ using Apache Flink 1.10.0, now I am using default checkpoint config in memory. Now to make it recovery when task manager restart, I need to store the state and checkpoint in filesystem, the all demo tell should using "hdfs://namenode:4000/....", but now I have no HDFS cluster, my Apache Flink is running in kubernetes cluster, how to store my check point in filesystem?

I read the docs of Apache Flink and tell me it support:

  • A persistent (or durable) data source that can replay records for a certain amount of time. Examples for such sources are persistent messages queues (e.g., Apache Kafka, RabbitMQ, Amazon Kinesis, Google PubSub) or file systems (e.g., HDFS, S3, GFS, NFS, Ceph, …).

  • A persistent storage for state, typically a distributed filesystem (e.g., HDFS, S3, GFS, NFS, Ceph, …)

how to config flink to using NFS to store checkpoint and state? I search from internete and find no story about this solution.

1

1 Answers

1
votes

To use NFS for checkpointing with Flink you should specify a checkpoint directory using a file: URI that is accessible from every node in the cluster (the job manager and all task managers need to have access using the same URI).

So, for example, you might mount your NFS volume at /data/flink/checkpoints on each machine, and then specify

state.checkpoints.dir: file:///data/flink/checkpoints