6
votes

As far as I understand after reading Kafka Streams documentation, it's not possible to use it for streaming data from only one partition from given topic, one always have to read it whole.

Is that correct?

If so, are there any plans to provide such an option to the API in the future?

4
The question is not so clear to me. The source of your streaming application can be a topic with only one partition. But it's possible I haven't understood the question ... can you elaborate please ? - ppatierno
I will give an example. Lets assume that I have topic "A" with 10 partitions and I want to stream data from this topic but only from partition 4 without gathering data from other paritions. - Purple
Then you need to copy only the data in partition 4 into another topic with only 1 partition and use that as input to Streams. - Hans Jespersen

4 Answers

5
votes

No you can't do that because the internal consumer subscribes to the topic joining a consumer group which is specified through the application-id so the partitions are assigned automatically. Btw why do you want do that ? Without re-balancing you lose the scalability feature provided by Kafka Stream because just adding/removing instances of your streaming application you can scale the entire process, thanks to the re-balancing on partitions.

4
votes

You can do something similar to your need using PartitionGrouper. A partition grouper can be used to create a stream task based on the given topic partition.

For example refer to the DefaultPartitionGrouper implementation. But it would require customization.

Therefore as @ppatierno suggested please look into your usecase and then design the topology in a way that you do not have to deviate from a standard practice.

1
votes

You can do this by specifying the topic,partition number and offset correctly

 Map(new TopicPartition(topic, partition) -> 2L)
    val stream = KafkaUtils.createDirectStream[String, String](
          ssc,
          PreferConsistent,
          Subscribe[String, String](topics, kafkaParams,offsets))

where partition refers to the Partition number,

2L refers to the starting offset of the partition

Refer streaming_from_specific_partiton for more details.

0
votes

You could not specify a partition in Kafka consumer because that is why Kafka scaling. Or we can say like this only a distributed system works. You can do segmentation and allocate each segment to a topic and each topic should have only one partition.

Since topics are registered in ZooKeeper , you might run into issues if trying to add too many of them, e.g. the case where you have a million users and have decided to create a topic per user.