4
votes

1.I have a topology(1 spout and 3 bolts) and 3 machines(1 nimbus and 2 worker nodes).Whether I want to run my topology on all 3 nodes or run on nimbus is enough? Is it nimbus will take care of distributing the code to other nodes?

2.My spout will run on nimbus or any of the worker nodes?

3.The 3 bolts are run on 3 separate nodes or run on same node? is it will take care by nimbus?

4.How do we track processing of bolt in nodes?

5.Is it any documentation available to understand the complete flow of processing a message in Storm?

3

3 Answers

0
votes

Answers:

  1. Nimbus just maintains the topology. Tasks like deploying the main jar etc is what nimbus does but nimbus itself do not do any processing related to topology. So you definitely need worker nodes. It is always good to have more than one worker node so that storm topology is resilient of any worker failing. When starting nimbus node one is required to specify the nimbus server and then it automatically picks the topology jar to run on this worker node.

  2. As mentioned in first point nimbus is just a manager doing on task at all does the spout runs on worker nodes only.

  3. For your third question I would recommend you to read this http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/

  4. Look at storm-ui when you start your storm processing it tells you how each step in the pipeline is performing and hence you can tweak the parallelism for each step after looking into it.

  5. To look into storm in detail go through the tutorial for storm on github https://github.com/nathanmarz/storm/wiki/_pages

0
votes

I have a topology(1 spout and 3 bolts) and 3 machines(1 nimbus and 2 worker nodes).Whether I want to run my topology on all 3 nodes or run on nimbus is enough?

Using multiple nodes allows you to distribute the load across the cluster, so definitely this should be beneficial to have multiple nodes instead of one.And also in case of a node failure the nimbus will be able to reassign the tasks to another machine. However it is possible to set up storm in a single node and everything can be run on a single machine.

Is it nimbus will take care of distributing the code to other nodes?

Yes

My spout will run on nimbus or any of the worker nodes

The slave nodes are responsible for running or executing the topology, they run a daemon called Supervisor

The 3 bolts are run on 3 separate nodes or run on same node? is it will take care by nimbus?

The distribution is taken care by the Nimbus (master node).Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.

How do we track processing of bolt in nodes?

Storm provides a web-based user interface that can optionally be launched on the master node (running the Nimbus daemon). The Storm UI provides a basic overview of the cluster state by showing cluster-level and topology-level diagnostics. The same can be lunched using the following commands

    # cd /path/to/storm/install/dir
    # bin/storm ui

By default it listen on the 8080 port and can be seen using the http://nimbus_host:8080/ url in your browser.

Is it any documentation available to understand the complete flow of processing a message in Storm?

the Storm Wiki page is a great place to learn and understand the working of storm. You can also follow the basic tutorial for more details

0
votes

nimbus is responsible for distributing the jobs but all your code including spouts or bolts will be running on the worker nodes.

You should be able to track progress for bolts through storm UI. also, you can log some messages to check process flow of data.