0
votes

After struggling with Artemis 2.11 and an older Java version I decided to update my whole system to the "latest greatest software" that is currently available. So I am using Artemis 2.14 and Java 14.0.2 on two Ubuntu 18.04 VM with 4 Cores an 16 GB RAM.

I configured the master-slave ha-policy like this:

MASTER:

<ha-policy>
    <replication>
       <master>
          <check-for-live-server>true</check-for-live-server>
       </master>
    </replication>
</ha-policy>

SLAVE

<ha-policy>
    <replication>
        <slave>
           <allow-failback>true</allow-failback>
        </slave>
    </replication>
</ha-policy>

And I am using the cluster-connection like this:

MASTER

<cluster-connections>
  <cluster-connection name="artemis-cluster">
    <connector-ref>Artemis-Node-A-Sync</connector-ref>
    <retry-interval>500</retry-interval>
    <use-duplicate-detection>true</use-duplicate-detection>
    <message-load-balancing>ON_DEMAND</message-load-balancing>
    <max-hops>1</max-hops>
    <static-connectors>
      <connector-ref>Node-B-Sync</connector-ref>
    </static-connectors>
  </cluster-connection>
</cluster-connections>

SLAVE

<cluster-connections>
  <cluster-connection name="artemis-cluster">
    <connector-ref>Node-B-Sync</connector-ref>
    <retry-interval>500</retry-interval>
    <use-duplicate-detection>true</use-duplicate-detection>
    <message-load-balancing>ON_DEMAND</message-load-balancing>
    <max-hops>1</max-hops>
    <static-connectors>
      <connector-ref>Node-A-Sync</connector-ref>
    </static-connectors>
  </cluster-connection>
</cluster-connections>

My problem is that I get this ERROR message from at SLAVE...

2020-08-07 12:45:37,548 INFO  [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2020-08-07 12:45:37,550 INFO  [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.5.12 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-)
2020-08-07 12:45:37,552 INFO  [io.hawt.jmx.UploadManager] Using file upload directory: /opt/mybroker-broker/tmp/uploads
2020-08-07 12:45:37,565 INFO  [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2020-08-07 12:45:37,585 INFO  [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/mybroker-broker/etc/jolokia-access.xml]
2020-08-07 12:45:37,605 INFO  [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control
2020-08-07 12:45:37,703 INFO  [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, *.*.*.*, *.*.*.*, localhost.localdomain]
2020-08-07 12:45:37,966 INFO  [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://0.0.0.0:8161
2020-08-07 12:45:37,967 INFO  [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://0.0.0.0:8161/console/jolokia
2020-08-07 12:45:37,967 INFO  [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://0.0.0.0:8161/console
2020-08-07 12:45:46,905 WARN  [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to 192.168.144.75/192.168.144.75:22522 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
2020-08-07 12:45:50,678 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: LiveFailoverQuorumVote
2020-08-07 12:45:50,678 INFO  [org.apache.activemq.artemis.core.server] AMQ221084: Requested 0 quorum votes
2020-08-07 12:45:50,679 INFO  [org.apache.activemq.artemis.core.server] AMQ221083: ignoring quorum vote as max cluster size is 1.
2020-08-07 12:45:50,679 INFO  [org.apache.activemq.artemis.core.server] AMQ221071: Failing over based on quorum vote results.
2020-08-07 12:45:50,720 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ229026: Backup Server was not yet in sync with live]
        at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:310) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3946) [artemis-server-2.14.0.jar:2.14.0]

The logfile at master is:

2020-08-07 12:44:58,292 INFO  [org.apache.activemq.artemis.core.server] AMQ221014: 21% loaded
2020-08-07 12:44:58,540 INFO  [org.apache.activemq.artemis.core.server] AMQ221014: 42% loaded
2020-08-07 12:44:59,020 INFO  [org.apache.activemq.artemis.core.server] AMQ221014: 64% loaded
2020-08-07 12:44:59,416 INFO  [org.apache.activemq.artemis.core.server] AMQ221014: 85% loaded
2020-08-07 12:45:12,143 INFO  [org.apache.activemq.artemis.core.server] AMQ221080: Deploying address DLQ supporting [ANYCAST]
2020-08-07 12:45:12,145 INFO  [org.apache.activemq.artemis.core.server] AMQ221003: Deploying ANYCAST queue DLQ on address DLQ
2020-08-07 12:45:12,151 INFO  [org.apache.activemq.artemis.core.server] AMQ221080: Deploying address ExpiryQueue supporting [ANYCAST]
2020-08-07 12:45:12,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221003: Deploying ANYCAST queue ExpiryQueue on address ExpiryQueue
2020-08-07 12:45:12,382 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:1883 for protocols [MQTT]
2020-08-07 12:45:12,385 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:22522 for protocols [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE]
2020-08-07 12:45:12,387 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:8883 for protocols [MQTT]
2020-08-07 12:45:12,388 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:18884 for protocols [MQTT]
2020-08-07 12:45:12,388 INFO  [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
2020-08-07 12:45:12,389 INFO  [org.apache.activemq.artemis.core.server] AMQ221001: Apache ActiveMQ Artemis Message Broker version 2.14.0 [0.0.0.0, nodeID=95f808d9-d641-11ea-9c48-005056073c33]
2020-08-07 12:45:12,650 INFO  [org.apache.activemq.hawtio.branding.PluginContextListener] Initialized activemq-branding plugin
2020-08-07 12:45:12,686 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
2020-08-07 12:45:13,281 INFO  [io.hawt.HawtioContextListener] Initialising hawtio services
2020-08-07 12:45:13,304 INFO  [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2020-08-07 12:45:13,306 INFO  [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.5.12 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-)
2020-08-07 12:45:13,311 INFO  [io.hawt.jmx.UploadManager] Using file upload directory: /opt/mybroker-broker/tmp/uploads
2020-08-07 12:45:13,325 INFO  [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2020-08-07 12:45:13,344 INFO  [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/mybroker-broker/etc/jolokia-access.xml]
2020-08-07 12:45:13,373 INFO  [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control
2020-08-07 12:45:13,472 INFO  [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, *.*.*.*, 192.168.144.75, localhost.localdomain]
2020-08-07 12:45:13,680 INFO  [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://0.0.0.0:8161
2020-08-07 12:45:13,681 INFO  [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://0.0.0.0:8161/console/jolokia
2020-08-07 12:45:13,681 INFO  [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://0.0.0.0:8161/console
2020-08-07 12:45:46,608 WARN  [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=NIOSequentialFile /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page, message=/opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files): ActiveMQIOErrorException[errorType=IO_ERROR message=/opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)]
        at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:156) [artemis-journal-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:98) [artemis-journal-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.Page.open(Page.java:483) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.openNewPage(PagingStoreImpl.java:1136) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.forceAnotherPage(PagingStoreImpl.java:606) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.getPageInformationForSync(JournalStorageManager.java:738) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:665) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:179) [artemis-server-2.14.0.jar:2.14.0]
        at java.base/java.lang.Thread.run(Thread.java:832) [java.base:]
Caused by: java.io.FileNotFoundException: /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)
        at java.base/java.io.RandomAccessFile.open0(Native Method) [java.base:]
        at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:347) [java.base:]
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:261) [java.base:]
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) [java.base:]
        at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:143) [artemis-journal-2.14.0.jar:2.14.0]
        ... 8 more

2020-08-07 12:45:46,615 WARN  [org.apache.activemq.artemis.core.server] AMQ222251: Unable to start replication: java.io.FileNotFoundException: /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)
        at java.base/java.io.RandomAccessFile.open0(Native Method) [java.base:]
        at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:347) [java.base:]
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:261) [java.base:]
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) [java.base:]
        at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:143) [artemis-journal-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:98) [artemis-journal-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.Page.open(Page.java:483) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.openNewPage(PagingStoreImpl.java:1136) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.forceAnotherPage(PagingStoreImpl.java:606) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.getPageInformationForSync(JournalStorageManager.java:738) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:665) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:179) [artemis-server-2.14.0.jar:2.14.0]
        at java.base/java.lang.Thread.run(Thread.java:832) [java.base:]

2020-08-07 12:45:46,693 WARN  [org.apache.activemq.artemis.utils.actors.OrderedExecutor] Server locator is closed (maybe it was garbage collected): java.lang.IllegalStateException: Server locator is closed (maybe it was garbage collected)
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.assertOpen(ServerLocatorImpl.java:1844) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:644) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:545) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:524) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.cluster.ClusterController$ConnectRunnable.run(ClusterController.java:433) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) [artemis-commons-2.14.0.jar:2.14.0]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [java.base:]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [java.base:]
        at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.14.0.jar:2.14.0]

2020-08-07 12:45:47,407 INFO  [io.hawt.HawtioContextListener] Destroying hawtio services
2020-08-07 12:45:47,409 INFO  [io.hawt.web.AuthenticationFilter] Destroying hawtio authentication filter 2020-08-07 12:45:47,442 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
2020-08-07 12:45:47,444 INFO  [org.apache.activemq.hawtio.branding.PluginContextListener] Destroyed activemq-branding plugin
2020-08-07 12:45:47,476 INFO  [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.14.0 [95f808d9-d641-11ea-9c48-005056073c33] stopped, uptime 1 minute
tail: /opt/mybroker-broker/log/artemis.log: file truncated
2020-08-07 12:46:07,212 INFO  [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server
2020-08-07 12:46:07,242 INFO  [org.apache.activemq.artemis.core.server] AMQ221000: live Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=data/journal,bindingsDirectory=data/bindings,largeMessagesDirectory=data/large-messages,pagingDirectory=data/paging)
2020-08-07 12:46:09,516 INFO  [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
2020-08-07 12:46:09,542 INFO  [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 5,368,709,120
2020-08-07 12:46:09,589 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2020-08-07 12:46:09,590 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
2020-08-07 12:46:09,591 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
2020-08-07 12:46:09,592 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
2020-08-07 12:46:09,592 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
2020-08-07 12:46:09,593 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
2020-08-07 12:46:09,776 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory /opt/mybroker-broker/data/paging/9fd77264-d6db-11ea-bb8e-005056073c33 did not have an identification file address.txt
2020-08-07 12:46:13,695 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory /opt/mybroker-broker/data/paging/9fd77264-d6db-11ea-bb8e-005056073c33 did not have an identification file address.txt

...when I do the following:

  1. Stop the SLAVE
  2. Restart the MASTER
  3. Start the SLAVE

Using Jolokia I see that the SLAVE is connected to the MASTER after some time.

Questions:

  1. What does the ERROR message mean?
  2. Is there an CLI command to get some information about synchronization?
  3. Is there an CLI command to get some system/cluster status information?
1
Can you provide the logs from the master server at the time the slave logs the ERROR? - Justin Bertram
You really need to focus your question down to a single issue. I count 3 different question here. There should only be 1 otherwise it's hard to identify correct answers. - Justin Bertram
@justin i added the logfile both master and slave in the question above - MF1

1 Answers

0
votes

The fundamental problem is noted in the log from the master broker:

2020-08-07 12:45:46,615 WARN  [org.apache.activemq.artemis.core.server] AMQ222251: Unable to start replication: java.io.FileNotFoundException: /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)

This is an environmental error indicating that the operating system will not allow the process (i.e. the broker) to open any more files.

Since the broker can't open all the files it needs then it can't properly synchronize its data with the backup. Furthermore, the inability to open additional files is interpreted as a "critical" error which forces the broker to shut itself down. When the master broker shuts down the slave activates, but since the master was not able to complete the initial replication with the slave the slave states:

2020-08-07 12:45:50,720 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ229026: Backup Server was not yet in sync with live]

The slave is simply logging the fact that it didn't complete the initial synchronization/replication process with the master before the master was shut down.

Ultimately you need to increase the file limit for the user running the broker. A quick web search should help you with the specifics of that task for your particular operating system.