2
votes

I recently started with Apache Storm and just finished building my first topologies (all in java).

As next step, I wanted to put sensor-values from a TI SensorTag, which is connected to an Raspberry Pi, in one of these topologies.

I'm able to send the sensor-data via HTTP, but I'm not sure how I would implement a working spout, which takes in these requests.

Idea of the topology: It should take in the HTTP-requests with the sensor value information, emit this data into the topology and write them into a file/database afterwards, using a bolt.

So far, i found a post on Stackoverflow, about an HTTP-Spout (Storm : Spout for reading data from a port), but sadly I was not allowed to leave a comment or write any private messages (Sorry if i missed something about that). I'm not sure how this spout is working exactly and wanted to ask for an example-code.(basicly I wanted to know how the whole thing was setted up in the topology).

Also i tried to use the DRPC-function of Storm (http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) to get my HTTP-requests into the topology, but I was'nt able to progress further through the documentation and storm-starter-examples so far, because im still learning how to use storm properly. I was really confused about setting up the drpc-server and how to configure the listening for the incoming requests.

So I wanted to know, if someone was also facing this problem and has found an solution or can give me advice, what else I could try.

Would such an HTTP-Spout (an socket connection, as far as I've seen?!) or an DRPC-server work?

ps: Also a code-template, other examples or any other sources of information, which could be helpful to understand about that topic would be nice!

2

2 Answers

1
votes

I would instead write a servlet to consume those HTTP requests and, on receiving a request, write the relevant information to Kafka. You can then use the Kafka spout (I would write my own spout, but that's a whole different question) to read that data and emit it into your topology. The primary benefit to using Kafka as an intermediate staging location is the ability to replay your data by reseting the committed Kafka offsets.

0
votes

Storm spout's are usually pulling data from a data source so what you are talking about is not very common. This is why Chris mentioned using a queuing product like Kafka to use as a buffer in between Storm and your Pi.

It may be possible to do what you are talking about inside a Storm spout. The problem is when you start to scale up from one machine to many because your Pi will not know which nodes the Storm workers are running on and therefore won't know where the HTTP server is listening.

I would recommend starting simple. Here is a simple WordCountTopology I wrote which you can run locally on your machine: storm-stlhug-demo.

To get started, at least run the server outside of storm:

  1. Pi does HTTP post to HTTP Server
  2. HTTP Server writes payload to files in a data directory
  3. Storm Spout polls the data directory and processes the files