8
votes

We're trying to build a platform using microservices that communicate async over kafka. It would seem natural, the way i understood it, to have 1 topic per aggregate type in each microservice. So a microservice implementing user registration would publish user related events into the topic "users". Other microservices would listen to events created from the "users" microservices and implement their own logic and fill their DBs accordingly. The problem is that other microservices might not be interested in all the events generated by the user microservice but rather a subset of these events, like UserCreated only (without UsernameChanged... for example). Using RabbitMq is easy since event handlers are invoked based on message type.

  1. Did you ever implement message based routing/filtering over kafka?
  2. Should we consume all the messages, deserialize them and ignore unneeded ones by the consumer? (sounds like an overhead)
  3. Should we forward these topics to storm and redirect these messages to consumer targeted topics? (sounds like an overkill and un-scalable)
  4. Using partitions doesn't seem logical as a routing mechanism
1
1 - Never seem anything like it in a year using Kafka. 2 - Pretty much, yes, unless you have more grained or targeted topics . The good thing is that consuming from Kafka is cheap if your client is fully featured with snappy support 3 - Only you can answer that. A lighter routing could be done with Camel. 4 - That's not the purpose of partitions at all. Kafka is a very powerful tool when you have huge chunks of data to make available quick, that comes at the cost of more subtle control how it gets distributed. You wouldn't water a garden with a firehose. - Rodrigo Del C. Andrade
Not only is consuming from Kafka cheap, but reading the whole stream and tossing what you want is actually pretty cheap from the producing side as well. Disk drives are 100s to 1,000s of times faster doing linear reads versus random reads. This is a huge part of the design philosophy of Kafka: kafka.apache.org/08/design.html - David Griffin

1 Answers

5
votes

Use a different topic for each of the standard object actions: Create, Read, Update, and Delete, with a naming convention like "UserCreated", "UserRead", etc. If you think about it, you will likely have a different schema for the objects in each. Created will require a valid object; Read will require some kind of filter; Update you might want to handle incremental updates (add 10 to a specific field, etc).

If the different actions have different schemas it makes deserialization difficult. If you're in a loosey-goosey language like JavaScript, ok -- no big deal. But a strictly typed language like Scala and having different schemas in this same topic is problematic.

It'll also solve you're problem -- you can listen for exactly the types of actions you want, no more, not less.