1
votes

Hi All,

I'm having an issue trying to automate a process to start/stop TIBCO EMS through other server by SSH command.

My script contains next command that is being executed from SERVER_01 on the EMS_SERVER_01 and EMS_SERVER_02 (secondary) to start the EMS instance 8064:

[tibco@SERVER_01 ~]$ ssh tibco@EMS_SERVER_02 "cd /opt/tibco/TIBCOHOME1/ems/8.1/bin;nohup ./tibemsd64 -config "/opt/tibco/CFS/AMX01/BPM_B8064/tibemsd_1178.conf" > /dev/null 2>&1 &"

EMS_SERVER_01 starts without problems but EMS_SERVER_02 is getting next errors when we execute previous command:

TIBCO Enterprise Message Service.
Copyright 2003-2014 by TIBCO Software Inc.
All rights reserved.

Version 8.1.0 V10 4/11/2014

2018-07-31 17:40:59.514 Process started from './tibemsd64'.
2018-07-31 17:40:59.516 Process Id: 25251
2018-07-31 17:40:59.516 Hostname: EMS_SERVER_02 
2018-07-31 17:40:59.516 Hostname IP address: 16.
2018-07-31 17:40:59.516 Hostname IP address: 16.
2018-07-31 17:40:59.516 Reading configuration from '/opt/tibco/CFS/AMX01/BPM_B8064/tibemsd_1178.conf'.
2018-07-31 17:40:59.516 Logging into file '/opt/tibco/CFS/AMX01/logs/ems_EMS_SERVER_02_8064_secondary.log'
2018-07-31 17:40:59.513 ERROR: Initialization failed: storage for '$QTCA01' not found.
2018-07-31 17:40:59.516 Active server 'tcp://EMS_SERVER_02:8064' not found.
2018-07-31 17:40:59.516 Server is re-entering standby state.
2018-07-31 17:40:59.663 Java Version 1.7.0.01
2018-07-31 17:40:59.670 Server name: 'BPM_B8064'.
2018-07-31 17:40:59.670 Storage Location: '/opt/tibco/CFS/AMX01/BPM_B8064/datastore'.
2018-07-31 17:40:59.670 Routing is enabled.
2018-07-31 17:40:59.670 Flow Control is enabled.
2018-07-31 17:40:59.670 Authorization is enabled.
2018-07-31 17:41:00.850 Secure Socket Layer is enabled, using OpenSSL 0.9.8y-fips 5 Feb 2013
2018-07-31 17:41:00.874 WARNING: Unable to initialize fault tolerant connection, remote server returned 'connect failed: server not in active state'
2018-07-31 17:41:00.874 Continuing as active server.
2018-07-31 17:41:00.872 [BPM_B8064@EMS_SERVER_02 ]: connect failed: server not in active state
2018-07-31 17:41:00.876 Accepting connections on tcp://EMS_SERVER_02:8064.
2018-07-31 17:41:00.879 Accepting connections on ssl://EMS_SERVER_02:8065.
2018-07-31 17:41:00.879 Recovering state, please wait.
2018-07-31 17:41:01.130 SEVERE ERROR: Unable to open store [$QTCA01]: [ ESTATUS = 230, ERRSTR = java.lang.UnsatisfiedLinkError: no ocijdbc11 in java.library.path ]
2018-07-31 17:41:01.136 ERROR: Initialization failed: storage for '$QTCA01' not found.
2018-07-31 17:41:01.136 FATAL: Exception in startup, exiting.

... and process doesn't start.

Strange thing is that if I first connect to the server EMS_SERVER_02, and then execute the command, process start without problems:

[tibco@SERVER_01 ~]$ ssh EMS_SERVER_02 
*******************************************************************
*******************************************************************
Login successfully: EMS_SERVER_02  
*****************************************************************
*****************************************************************
Last login:  31 Jul 18:32
$ cd /opt/tibco/TIBCOHOME1/ems/8.1/bin;nohup ./tibemsd64 -config "/opt/tibco/CFS/AMX01/BPM_B8064/tibemsd_1178.conf" > /dev/null 2>&1 &
[1]     28410
$ ps -ef | grep 8064
   tibco 28410     1  0 18:34:22 pts/1     0:05 ./tibemsd64 -config 
TIBCO Enterprise Message Service.
Copyright 2003-2014 by TIBCO Software Inc.
All rights reserved.

Version 8.1.0 V10 4/11/2014

2018-07-31 18:34:22.984 Process started from './tibemsd64'.
2018-07-31 18:34:22.985 Process Id: 28410
2018-07-31 18:34:22.985 Hostname: EMS_SERVER_02
2018-07-31 18:34:22.985 Hostname IP address: 16.
2018-07-31 18:34:22.985 Hostname IP address: 16.
2018-07-31 18:34:22.985 Reading configuration from '/opt/tibco/CFS/AMX01/BPM_B8064/tibemsd_1178.conf'.
2018-07-31 18:34:22.985 Logging into file '/opt/tibco/CFS/AMX01/logs/ems_EMS_SERVER_02_8064_secondary.log'
2018-07-31 18:34:23.149 Java Version 1.7.0.01
2018-07-31 18:34:23.152 Server name: 'BPM_B8064'.
2018-07-31 18:34:23.152 Storage Location: '/opt/tibco/CFS/AMX01/BPM_B8064/datastore'.
2018-07-31 18:34:23.152 Routing is enabled.
2018-07-31 18:34:23.153 Flow Control is enabled.
2018-07-31 18:34:23.153 Authorization is enabled.
2018-07-31 18:34:24.333 Secure Socket Layer is enabled, using OpenSSL 0.9.8y-fips 5 Feb 2013
2018-07-31 18:34:24.357 WARNING: Unable to initialize fault tolerant connection, remote server returned 'connect failed: server not in active state'
2018-07-31 18:34:24.357 Continuing as active server.
2018-07-31 18:34:24.355 [BPM_B8064@EMS_SERVER_02]: connect failed: server not in active state
2018-07-31 18:34:24.358 Accepting connections on tcp://EMS_SERVER_02:8064.
2018-07-31 18:34:24.362 Accepting connections on ssl://EMS_SERVER_02:8065.
2018-07-31 18:34:24.362 Recovering state, please wait.
2018-07-31 18:34:26.239 Store '$QTCA01' locked by 'BPM_B8064'
2018-07-31 18:34:26.755 Recovered 5 messages.
2018-07-31 18:34:26.781 Server is active.
2018-07-31 18:34:29.936 Missing heartbeats from active server 'tcp://EMS_SERVER_02:8064'.
2018-07-31 18:34:29.941 Server activating on failure of 'tcp://EMS_SERVER_02:8064'.
2018-07-31 18:34:29.941 Server rereading configuration.
2018-07-31 18:34:30.387 Recovering state, please wait.
2018-07-31 18:34:30.389 SEVERE ERROR: Unable to open store [$QTCA01]: [ ESTATUS = 230, ERRSTR = java.lang.UnsatisfiedLinkError: no ocijdbc11 in java.library.path ]
2018-07-31 18:34:30.390 ERROR: Unable to open store file '/opt/tibco/CFS/AMX01/BPM_B8064/datastore/async-msgs.db', file may be locked.
2018-07-31 18:34:30.391 ERROR: Unable to open store file '/opt/tibco/CFS/AMX01/BPM_B8064/datastore/sync-msgs.db', file may be locked.
2018-07-31 18:34:30.392 ERROR: Unable to open store file '/opt/tibco/CFS/AMX01/BPM_B8064/datastore/meta.db', file may be locked.
2018-07-31 18:34:30.392 ERROR: Initialization failed: storage for '$QTCA01' not found.
2018-07-31 18:34:30.393 Server is re-entering standby state.
2018-07-31 18:34:30.394 Standby server 'BPM_B8064@EMS_SERVER_01' has connected.

Am I missing some configuration? Or am I doing anything wrong?

I would appreciate your help on this.

2
your call to ./tibemsd64 .... > /dev/null 2>&1 & is discarding possibly important error messages. I would redirect like > /tmp/timbmscd64.log 2>&1 and then examine that. Based on the path problems listed as errors, there is almost certain something different between the two environments you're running these under. env > $HOSTNAME.env at the top of your script, and then compare files from both machines. Not sure TIBCO if a support contract would help on a problem like this, but also worth opening a ticket (if you haven't tried). Just general advice, no tibco experience. Good luck.shellter
Do your two configurations have ft_active set ? It must point to the IP of the 'other EMS daemon' - also check you this tutorial: tutorialspedia.com/…Axel Podehl
@shellter thanks for you comment, I've redirected output to a file and also realized that env variables are different in both scenariosuser10162496

2 Answers

2
votes

The error you have is related to an oracle datastore:

java.lang.UnsatisfiedLinkError: no ocijdbc11 in java.library.path

And also the default file storage on the second log: just check that the folder exist and are writable by the linux user that run EMS.

You should be aware that usually JDBC datastore kills the performances of EMS. It also add a layer of complexity that is likely to fail: like tablespace full, oracle fs full. As soon as EMS can't write to a store, it will failover to the other instance.

A few recommendations :

  • Put you start scripts in a script like start-ems.sh
  • Remove the store using OCI JDBC and forget about it, and have it work with standard file storage. (NFSv4 share can do the job, GFS preferred)
  • fix the file storage issue
  • For amxbpm, don't use the EMS 'admin' account but create a standard user with the correct settings (see install documentation)
  • rename tibemsd.conf as follow : tibemsd-AMXBPM-8064.conf, like this, in a 'ps' you know what the ems is for.
0
votes

Probably your shared datastore is corrupted due to network break.

So - stop all ems instances - take a backup of datastore files (.db) - remove datastore file (.db) - restart ems instances