corosync/pacemaker treating OCF_RUNNING_MASTER as error

Question

I created an ocf resource agent and I want to run it as a Master/Slave set. At first my monitor function returned OCF_SUCCESS on a running node (regardless of whether it was a master or a slave) which did actually work, but pacemaker did not know which one was the current master (both instances reported as slaves).

That's why I changed the monitor function to return OCF_RUNNING_MASTER on the master and OCF_SUCCESS on the slave (because I saw it in the code of drdb). Unfortunately pacemaker seems to interpret this as an error, kills the master, pormotes the second node to master, and so on.

Does anyone know how I can make pacemaker interpret OCF_RUNNING_MASTER as success?

crm config:

node 3232286770: VStorage1 \
        attributes standby=off
node 3232286771: VStorage2
primitive virtual_ip IPaddr2 \
        params ip=192.168.100.230 cidr_netmask=32 nic=ens256 \
        op monitor interval=10s \
        meta migration-threshold=10
primitive filecluster ocf:msn:cluster \
        op start timeout=120 interval=0 \
        op stop timeout=120 interval=0 \
        op promote timeout=120 interval=0 \
        op demote timeout=120 interval=0 \
        op monitor interval=20s role=Slave \
        op monitor interval=10s role=Master \
        meta migration-threshold=10
ms ms filecluster
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.14-70404b0 \
        cluster-infrastructure=corosync \
        cluster-name=debian \
        stonith-enabled=false \
        no-quorum-policy=ignore

crm status output:

root@VStorage1:/usr/lib/ocf/resource.d# crm status
Last updated: Mon Nov  5 11:21:34 2018          Last change: Fri Nov  2 20:22:53 2018 by root via cibadmin on VStorage1
Stack: corosync
Current DC: VStorage1 (version 1.1.14-70404b0) - partition with quorum
2 nodes and 3 resources configured

Online: [ VStorage1 VStorage2 ]

Full list of resources:

 virtual_ip     (ocf::heartbeat:IPaddr2):       Started VStorage1
 Master/Slave Set: ms [filecluster]
     Slaves: [ VStorage1 ]
     Stopped: [ VStorage2 ]

Failed Actions:
* filecluster_monitor_20000 on VStorage1 'master' (8): call=153, status=complete, exitreason='none',
    last-rc-change='Fri Nov  2 20:27:28 2018', queued=0ms, exec=0ms
* filecluster_monitor_20000 on VStorage2 'master' (8): call=135, status=complete, exitreason='none',
    last-rc-change='Fri Nov  2 20:27:11 2018', queued=0ms, exec=0ms

inlineMacro inlineMacro · Accepted Answer · 2019-01-24T07:16:25

a master-slave resource agent will report both slave only if the promote to master fails. What is the condition in your ocf_agent for promoting to master. See drbd agent for condition when the resource is promoted to master.

corosync/pacemaker treating OCF_RUNNING_MASTER as error

1 Answers