1
votes

I'm following http://jayatiatblogs.blogspot.com/2011/11/storm-installation.html to try configuring Apache Storm remote cluster using few virtual machine (EC2) with Ubuntu 14.04 LTS on Amazon Web Services.

My master node is 10.0.0.230, my slave node is 10.0.0.79. My zookeeper reside in my master node. When I use storm jar storm-starter-0.9.4-jar-with-dependencies.jar storm.starter.RollingTopWords production-topology remote in master node, the message below appear, indicating it is successfully submitted:

339  [main] INFO  storm.starter.RollingTopWords - Topology name: production-topology
377  [main] INFO  storm.starter.RollingTopWords - Running in remote (cluster) mode
651  [main] INFO  backtype.storm.StormSubmitter - Jar not uploaded to master yet. Submitting jar...
655  [main] INFO  backtype.storm.StormSubmitter - Uploading topology jar storm-starter-0.9.4-jar-with-dependencies.jar to assigned location: /home/ubuntu/storm/data/nimbus/inbox/stormjar-380bb1a2-1699-4ad1-8341-3d4b92c14764.jar
672  [main] INFO  backtype.storm.StormSubmitter - Successfully uploaded topology jar to assigned location: /home/ubuntu/storm/data/nimbus/inbox/stormjar-380bb1a2-1699-4ad1-8341-3d4b92c14764.jar
672  [main] INFO  backtype.storm.StormSubmitter - Submitting topology production-topology in distributed mode with conf {"topology.debug":true}
714  [main] INFO  backtype.storm.StormSubmitter - Finished submitting topology: production-topology

The Stoum UI & storm list command show that the topology is active:

Topology_name        Status     Num_tasks  Num_workers  Uptime_secs
-------------------------------------------------------------------
production-topology  ACTIVE     0          0            59

However, in the Cluster Summary of Storm UI, there is 0 supervisor, 0 used slots, 0 free slots, 0 executors & 0 tasks. In the Topology Configuration, the supervisor.slots.ports show that it uses the default supervisor slot ports of the master node, instead of the supervisor slot ports of the slave node.

Below are my zoo.cfg of my master node:

tickTime=2000
dataDir=/home/ubuntu/zookeeper-data
clientPort=2181

The storm.yaml of my master node:

 storm.zookeeper.servers:
     - "10.0.0.230"
 storm.zookeeper.port: 2181

 nimbus.host: "localhost"
 nimbus.thrift.port: 6627
 nimbus.task.launch.secs: 240

 supervisor.worker.start.timeout.secs: 240
 supervisor.worker.timeout.secs: 240

 storm.local.dir: "/home/ubuntu/storm/data"   
 java.library.path: "/usr/lib/jvm/java-7-oracle"

The storm.yaml of my slave node:

 storm.zookeeper.server:
     - "10.0.0.230"
 storm.zookeeper.port: 2181
 nimbus.host: "10.0.0.230"
 nimbus.thrift.port: 6627

 storm.local.dir: "/home/ubuntu/storm/data"
 java.library.path: "/usr/lib/jvm/java-7-oracle"

 supervisor.slots.ports:
     - 6700
     - 6701
     - 6702
     - 6703
     - 6704

I had use zkCli.sh -server 10.0.0.230:2181 to connect to the zookeeper at the master node, it works fine:

2015-05-04 03:40:20,866 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=10.0.0.230:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@63f78dde
2015-05-04 03:40:20,888 [myid:] - INFO  [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread@975] - Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
Welcome to ZooKeeper!
2015-05-04 03:40:20,900 [myid:] - INFO  [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread@852] - Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
JLine support is enabled
2015-05-04 03:40:20,918 [myid:] - INFO  [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d1ca1ab73001c, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: 10.0.0.230:2181(CONNECTED) 0]

The below are the supervisor logs from my slave nodes:

2015-05-06T06:16:28.487+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:16:28.487+0000 o.a.s.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_80]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) ~[na:1.7.0_80]
        at org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) ~[storm-core-0.9.4.jar:0.9.4]
2015-05-06T06:16:28.589+0000 b.s.d.supervisor [ERROR] Error on initialization of server mk-supervisor
java.lang.RuntimeException: org.apache.storm.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storm
        at backtype.storm.util$wrap_in_runtime.invoke(util.clj:44) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.zookeeper$exists_node_QMARK_$fn__807.invoke(zookeeper.clj:102) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:98) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:114) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.cluster$mk_distributed_cluster_state.invoke(cluster.clj:43) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.cluster$mk_storm_cluster_state.invoke(cluster.clj:238) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$supervisor_data.invoke(supervisor.clj:214) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$fn__5518$exec_fn__1754__auto____5519.invoke(supervisor.clj:409) ~[storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.applyToHelper(AFn.java:167) [clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
        at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor$fn__5518$mk_supervisor__5544.doInvoke(supervisor.clj:405) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:629) [storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:659) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.applyToHelper(AFn.java:159) [clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor.main(Unknown Source) [storm-core-0.9.4.jar:0.9.4]
Caused by: org.apache.storm.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storm
        at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:99) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) ~[storm-core-0.9.4.jar:0.9.4]
        at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) ~[storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.zookeeper$exists_node_QMARK_$fn__807.invoke(zookeeper.clj:101) ~[storm-core-0.9.4.jar:0.9.4]
        ... 16 common frames omitted
2015-05-06T06:16:28.607+0000 b.s.util [ERROR] Halting process: ("Error on initialization")
java.lang.RuntimeException: ("Error on initialization")
        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor$fn__5518$mk_supervisor__5544.doInvoke(supervisor.clj:405) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:629) [storm-core-0.9.4.jar:0.9.4]
        at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:659) [storm-core-0.9.4.jar:0.9.4]
        at clojure.lang.AFn.applyToHelper(AFn.java:159) [clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.supervisor.main(Unknown Source) [storm-core-0.9.4.jar:0.9.4]

Below are my nimbus logs from my master node:

2015-05-06T06:14:19.291+0000 b.s.d.nimbus [INFO] Using default scheduler
2015-05-06T06:14:19.304+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries [5]
2015-05-06T06:14:19.415+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-05-06T06:14:19.417+0000 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.0.230:2181 sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState@795bca46
2015-05-06T06:14:19.436+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:14:19.448+0000 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
2015-05-06T06:14:19.457+0000 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d27dbda310000, negotiated timeout = 20000
2015-05-06T06:14:19.459+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2015-05-06T06:14:19.460+0000 b.s.zookeeper [INFO] Zookeeper state update: :connected:none
2015-05-06T06:14:20.485+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut down
2015-05-06T06:14:20.485+0000 o.a.s.z.ZooKeeper [INFO] Session: 0x14d27dbda310000 closed
2015-05-06T06:14:20.486+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries [5]
2015-05-06T06:14:20.487+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-05-06T06:14:20.487+0000 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.0.230:2181/storm sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState@510d246b
2015-05-06T06:14:20.504+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:14:20.505+0000 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
2015-05-06T06:14:20.507+0000 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d27dbda310001, negotiated timeout = 20000
2015-05-06T06:14:20.507+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2015-05-06T06:14:20.547+0000 b.s.d.nimbus [INFO] Starting Nimbus server...

I had used storm nimbus & storm ui in my master node, storm supervisor in my slave node.

From the supervisor.logs from my slave node, it show that my slave node tend to connect to zookeeper on local host, although I had specified in the storm.yaml of my slave node that my zookeeper is in my master node. Why this happens and how to solve this?

So, why in the Cluster Summary of Storm UI, there is 0 supervisor, 0 used slots, 0 free slots, 0 executors & 0 tasks ? Why it uses the supervisor slot ports of the master node, instead of the slave node?

When I click the production-topology in the Topology Summary of Storm UI, there is 0 Num workers, 0 Num executors, 0 Num tasks? Why there is no info display for Spouts & Bolts?

1
First check if your supervisor is running or not. As you have already mentioned there are 0 supervisors therefore the job is not assigned yet and that is why don't see any information related to spouts and bolts. Have you started storm supervisor after starting storm nimbus ?sahu
I do start my supervisor. Please refer to my edited question.Toshihiko
check in the slave node if supervisor is running or not. If it running and your configuration is correct then it should itself in storm ui. I think your supervisor is not able to connect to nimbus. Check supervisor logs, see if it is connected to Nimbus or not.sahu
As @sahu says, your Supervisor process is not connecting to your Nimbus host. There's lots of possibilities for why that is not happening. The first place to look are your storm configurations (ie storm.yaml), and in the supervisor's logs (either in /var/log/storm or wherever you installed storm).nelsonda
Sahu & Doomy, I had added the supervisor logs from my slave node & nimbus logs from my master node. From the supervisor logs, as if the supervisor is connected to local host, instead of master, how should I solve this?Toshihiko

1 Answers

0
votes

I discovered the problem. I should set my zookeeper at my slave nodes, not at my master node. Now the problem is solved & the storm cluster is up.