I have a use case where I would be reading a set of key / value pairs, where key is just a String and value is a JSON. I have to expose these values as JSON's to a REST end-point which I would do using a kafka-streaming consumer.
Now my questions are:
How do I deal with Kafka partitions? I'm planning to use spark-streaming for the consumer
How about the producer? I would like to poll the data from an external service at a constant interval and write the resulting key / value pair to the Kafka topic. Is the a streaming producer?
Is this even a valid use case to employ Kafka? I mean, I could have another consumer group that just logs the incoming key / value pairs to a database. This is exactly what attracts me to use Kafka, the possibility to have multiple consumer groups to do different things!
Partitioning the topic I suppose is to increase parallelism, thereby increasing consumer throughput. How does this throughput compare with no partitioning? I have a use case where I have to ensure ordering, so I cannot partition the topic, but at the same time I would like to have a very high throughput for my consumer. How do I go about doing this?
Any suggestions?