There are a few different issues that can cause this. One is if there are explicitly defined CLUSSDR
channels pointing to a non-repository QMgr. This causes repository messages to arrive at the non-repos QMgr which can cause its amqrrmfa
repository process to die. Another is that there have been a few APARS (such as this one) which can lead to that process dieing. The solutions, respectively, are to fix the configuration issues or to apply the latest Fix Pack. Another issue, less commonly seen, is that a message to a new QMgr will error out before the new QMgr can resolve to the local QMgr. In this case, the REFRESH
doesn't actually cause the remote QMgr to resolve, it just provides time for the resolution to complete.
Debugging this involves isolating the possible causes. Check that amqrrmfa
is running. Check that all non-repository QMgrs have one and ONLY one explicitly defined CLUSSDR channel. Verify that all repositories have one and ONLY one explicitly defined CLUSSDR to each other repository. If overlapping clusters are used make sure to NOT overlap the channels. This means avoiding channel names like TO.QMGR
and prefer names like CLUSTER.QMGR
. Verify this by insuring channels do not use the CLUSNL
attribute and use the CLUSTER
attribute instead. Finally, reconcile the objects in both repositories and the non-repository by issuing DIS CLUSQMGR(*)
and DIS QCLUSTER(*)
. The repositories should have identical object inventories. If that's wrong then there's the problem. The non-repository should have an entry for every QMgr it has previously talked to.
One thing I have seen in the past was that an administrator had scheduled a REFRESH CLUSTER
. His thinking was that this was something they needed to do to fix the cluster so why not run it on a regular basis? So he scheduled it to run daily. Then each night it made the QMgr forget about the other QMgrs in the cluster and the first time an app resolved a remote QMgr each day there was a flurry of repository traffic. This caused enough of a delay that there were a few 2087 errors each morning. Not that you would do such a thing. :-)
DIS CLUSQMGR
? Cluster member no longer shows up at repositoryDIS CLUSQMGR
? What version of WMQ and what do the error logs show when this happens? – T.Rob