0
votes

Basically, MM is replicating MORE than I need it to.

I have four environments, DEV01, DEV02, TST01, and TST02, that each have two Servers running the same App that is generating JSON files. Logstash is reading those files and pushing messages into two, three node Kafka Clusters, KAF01 & KAF02. The DEV01 & TST01 boxes push to the KAF01 Cluster, with corresponding DEV01 & TST01 topics, and the DEV02 & TST02 boxes push to the KAF02 Cluster, with corresponding DEV02 & TST02 topics. Logstash is running on each of the Kafka nodes to then push the messages into corresponding Elasticsearch Clusters. This all works as expected. I then added in MM to replicated messages between environments, IE: DEV01<->DEV02, TST01<->TST02. I started the MM process for the DEV environments and everything worked fine. Then, on the same Hosts, I started a 2nd MM process for the TST environments and everything seemed fine until I realized that I was seeing messages from TST in DEV Elasticsearch and vice versa.

Here's a rough diagram of the flow:

Flow Diagram

I have MM running on the first Hosts in each Kafka Cluster, IE: kaf01-01 & kaf02-01. For the KAF01 Cluster, kaf01-01 is setup to mirror both the dev01 & tst01 topics to the KAF02 Cluster:

kafka-mirror-maker.sh kafka.tools.MirrorMaker --consumer.config dev01_mm_source.properties --num.streams 1 --producer.config dev01_mm_target.properties --whitelist="dev01"

For --consumer.config, the dev01_mm_source.properties file is configured with the KAF01 Cluster nodes. For --producer.config, the dev01_mm_target.properties file is configured with the KAF02 Cluster nodes.

kafka-mirror-maker.sh kafka.tools.MirrorMaker --consumer.config tst01_mm_source.properties --num.streams 1 --producer.config tst01_mm_target.properties --whitelist="tst01"

For --consumer.config, the tst01_mm_source.properties file is configured with the KAF01 Cluster nodes. For --producer.config, the tst01_mm_target.properties file is configured with the KAF02 Cluster nodes.

For the KAF02 Cluster, kaf02-01 is setup to mirror both the dev02 & tst02 topics to the KAF01 Cluster:

kafka-mirror-maker.sh kafka.tools.MirrorMaker --consumer.config dev02_mm_source.properties --num.streams 1 --producer.config dev02_mm_target.properties --whitelist="dev02"

For --consumer.config, the dev02_mm_source.properties file is configured with the KAF02 Cluster nodes. For --producer.config, the dev02_mm_target.properties file is configured with the KAF01 Cluster nodes.

kafka-mirror-maker.sh kafka.tools.MirrorMaker --consumer.config tst02_mm_source.properties --num.streams 1 --producer.config tst02_mm_target.properties --whitelist="tst02"

For --consumer.config, the tst02_mm_source.properties file is configured with the KAF02 Cluster nodes. For --producer.config, the tst02_mm_target.properties file is configured with the KAF01 Cluster nodes.

Do I have things mixed up? Do I have the --consumer.config and --producer.config files backwards? Is the regex for the --whitelist option that I'm using incorrect? Not really using regex either, just a quoted string. I've triple-checked that Logstash on all of the App boxes is configured to push to the correct Kafka topic and that Logstash on the Kafka boxes is configured to pull from the correct Kafka topic and then push to the correct Elasticsearch Cluster.

Just started working with Kafka and MM today so I'm totally new to all of this and any/all help is greatly appreciated.

1
I don't quite understand the arrows. Are you running a two-way mirror? I would suggest researching the message handler argument for MirrorMaker and renaming the topic with a suffix of its origin. For example github.com/gwenshap/kafka-examples/blob/master/…OneCricketeer
Possibly just a bad choice of arrow. I didn't think two way mirror was possible. KAF01 pushes from DEV01 and TST01 topics to the KAF02 Cluster topics and then KAF02 pushes messages from DEV02 and TST02 topics to the KAF01 Cluster topics.Tronyx
It's not bidirectional mirroring, but just running the same command and flipping the produce/consume. Is there a particular reason you're running one MirrorMaker per topic rather than using the whitelist to mirror both topics at once? Then you only would have two processes to maintain and debug rather than four. You might want to use regex ^(topic)$ to force an exact match on your topic nameOneCricketeer
Ok, that makes more sense. I didn't realize that I could do multiple topics per MM instance either. Would that just be multiple entries in the --whitelist, IE: --whitelist="^(topic1)$,^(topic2)$"? Also, I'm running MM on the source Cluster, is that correct?Tronyx
@cricket_007 Thank you. I have figured this out. I was trying to have Logstash output to two different ES Clusters, which a single instance of Logstash cannot do, so it was mushing them together. MirrorMaker is working as expect. I've changed where Logstash is running to separate this out more and everything is now working as expected.Tronyx

1 Answers

0
votes

I have figured this out. I was trying to have Logstash output to two different ES Clusters, which a single instance of Logstash apparently cannot do, so it was mushing them together. MirrorMaker is working as expected. I've changed where Logstash is running, on each of the Elasticsearch Nodes themselves to pull from the Kafka topics, to separate this out more and everything is now working as expected.