1
votes

I am evaluating Message Brokers for a project and could not find a clear answer to whether Apache Kafka supports the following use case:

  • Messages with additional attributes to filter on at receiving the messages are pushed to topics. These additional attributes can be considered as a primary key for each message, in the simplest form it is just one Id-Attribute (e.g. an Id of a sensor that (unregularly) produces measurement data).
  • 0 to n consumers receive these messages from the topics, eventually filtering on the primary key.
  • Messages are not consumed at receiving them, so all consumers on the topics will receive all messages pushed to them as long as they are consuming them (being "online").
  • When there is no consumer receiving messages from a topic the Message Broker at least updates the internal state of messages per primary key.
  • When a consumer subscribes a topic, it has to be able to receive the last message written per primary key at start and from then on all new messages pushed to the queue, eventually filtered by the primary key. The receiver should be able to recognize somehow, that all messages of the initial state at start were received.

Does Kafka support this use case and how could this be achieved? If Kafka is not able to provide this functionality, what other Message Brokers might be able to do so?

1

1 Answers

1
votes

Messages with additional attributes to filter on at receiving the messages

You can filter messages using Kafka Streams or KSQL. The output of this operation will be a new topic for consumers to read from.

Alternatively, you can perform topic partitioning by this "ID" field, setting it as the Kafka message key, depending on the cardinality of that value.

0 to n consumers receive these messages from the topics

Yes, Kafka can have N consumers

Messages are not consumed at receiving them

Unclear what this means. "Consume" and "receive" are the same thing.

consumers on the topics will receive all messages pushed to them as long as they are consuming them

Messages are not "pushed" to online consumers, they are poll-ed. Any subscribed consumer will see messages it has requested from its topics

:When there is no consumer receiving messages from a topic the Message Broker at least updates the internal state of messages per primary key

Kafka has no primary keys. It has offsets. If there is no consumer for a topic, then offsets will expire and the broker will delete the messages. Message content itself will never be modified

When a consumer subscribes a topic, it has to be able to receive the last message written per primary key at start and from then on all new messages pushed to the queue

Set auto.commit.offset=earliest will ensure you begin reading from the start offset for every new consumer group.

:The receiver should be able to recognize somehow, that all messages of the initial state at start were received.

Monitoring this is tricky as it depends on the client, but it includes checking offset lags of the consumer group and is not provided out of the box, as far as I've seen at the at the client side. You can externally run the consumer groups command to check out lag, though.

Confluent Control Center shows a visual message consumption rate, but the consumer protocol is designed to continously run, not stop at the "end" of a topic


Overall, if you expect primary keys of databases and quick filtering, you can use Kafka as just a pipe into a database of your choice, then slice&dice from there