I am working on building an application with below requirements and I am just getting started with flink.
- Ingest data into Kafka with say 50 partitions (Incoming rate - 100,000 msgs/sec)
- Read data from Kafka and process each data (Do some computation, compare with old data etc) real time
- Store the output on Cassandra
I was looking for a real time streaming platform and found Flink to be a great fit for both real time and batch.
- Do you think flink is the best fit for my use case or should I use Storm, Spark streaming or any other streaming platforms?
- Do I need to write a data pipeline in google data flow to execute my sequence of steps on flink or is there any other way to perform a sequence of steps for realtime streaming?
- Say if my each computation take like 20 milliseconds, how can I better design it with flink and get better throughput.
- Can I use Redis or Cassandra to get some data within flink for each computation?
- Will I be able to use JVM in-memory cache inside flink?
- Also can I aggregate data based on a key for some time window (example 5 seconds). For example lets say there are 100 messages coming in and 10 messages have the same key, can I group all messages with the same key together and process it.
- Are there any tutorials on best practices using flink?
Thanks and appreciate all your help.