1
votes

For a cross network confluent platform, we have one kafka cluster on-premise and another on AWS in which data is replicated from on-prem to AWS using mirror maker. Both clusters are independent with their own schema-registry, rest proxy and connect.Both clusters have different set of producers and consumers and selective topics are being mirrored between clusters.

What should be the best practice to deploy schema-registry ? Should we have one master (say on-premise) and others as non-eligible masters on on-prem and AWS ?

We suspect schema-registry can have issues with respect to schema ids when topics are replicated between clusters and we have 2 masters (aws and onprem).

Thanks!

1
Yes. I referred this but wanted to make sure if this is a hard requirement to keep one master schema per DC.Tony
@cricket_007 - Is it a recommended solution to keep one schema registry master for two independent active clusters ? We have producers and consumers running on both clusters and want to deploy schema-registry services for same. If we deploy one single SR cluster with a master on one DC and slaves on both DCs, do we need to mirror any schema topics?Tony
PS- we do not have the Confluent enterprise replicator which is mentioned almost everywhere in the multi-dc deployment documentation :)Tony

1 Answers

1
votes

If you use two different master registries, I find that would be difficult to manage. (See mistake #2 for self-managed registries). The purpose of master.eligble=false on a second instance/cluster is that all ID registration events have a single source of truth. As the docs say, The Schema Registry nodes in both datacenters link to the primary Kafka cluster in DC A, so you would need to establish a valid network link between AWS and onprem, anyway.

Otherwise, with multiple masters, you will need to mirror the schemas topic if you want exact same subjects and schema ids between environments. However, this is primarily meant to be used as a backup, and you would eventually run into conflicting schema IDs for any producer in the destination region pushing schemas to the other master. Hence why the first diagram shows only consumers in the remote datacenter.
If you do not do this, then let's say you mirrored a topic from cluster A to cluster B, and the consumer used registry B in the settings, it would attempt to lookup an ID from registry A (which is embedded in the message), and that either would not exist or would be an incorrect ID for the topic being read.

I wrote a Kafka Connect plugin to work around that issue by registering a new ID in a remote master registry - https://github.com/cricket007/schema-registry-transfer-smt , though you said you're using MirrorMaker, so you would need to take the logic there and apply it to the MessageHandler interface in MirrorMaker

I've really only worked with one master, on-prem, and in AWS, the registry settings have Zookeeper connection pointing to the on-prem cluster settings.

And we don't mirror everything as the docs suggest, only specific topics. The purpose of using Replicator rather than MirrorMaker is that consumer failover is better supported, rather than simply getting data "over the wire", your clients are less dependent upon where they are running as well.