7
votes

Context: OS: Linux (Ubuntu), language: C (actually Lua, but this should not matter).

I would prefer a ZeroMQ-based solution, but will accept anything sane enough.

Note: For technical reasons I can not use POSIX signals here.

I have several identical long-living processes on a single machine ("workers").

From time to time I need to deliver a control message to each of processes via a command-line tool. Example:

$ command-and-control worker-type run-collect-garbage

Each of workers on this machine should receive a run-collect-garbage message. Note: it would be perfect if the solution would somehow work for all workers on all machines in the cluster, but I can write that part myself.

This is easily done if I will store some information about running workers. For example keep the PIDs for them in a known location and open a control Unix domain socket on a known path with a PID somewhere in it. Or open TCP socket and store host and port somewhere.

But this would require careful management of the stored information — e.g. what if worker process suddenly dies? (Nothing unmanageable, but, still, extra fuss.) Also, the information needs to be stored somewhere, thus adding an extra bit of complexity.

Is there a good way to do this in PUB/SUB style? That is, workers are subscribers, command-and-control tool is a publisher, and all they know is a single "channel url", so to say, on which to come for messages.

Additional requirements:

  • Messages to the control channel must wake up workers from the poll (select, whatever) loop.
  • Message delivery must be guaranteed, and it must reach each and every worker that is listening.
  • Worker should have a way to monitor for messages without blocking — ideally by the poll/select/whatever loop mentioned above.
  • Ideally, worker process should be "server" in a sense — he should not bother about keeping connections to the "channel server" (if any) persistent etc. — or this should be done transparently by the framework.
4

4 Answers

4
votes

Usually such a pattern requires a proxy for the publisher, i.e. you send to the proxy which immediately accepts delivery and then that reliably forwads to the end subscriber workers. The ZeroMQ guide covers a few different methods of implementing this.

http://zguide.zeromq.org/page:all

2
votes

Given your requirements, Steve's suggestion does seem the simplest: run a daemon which listens on two known sockets - the workers connect to that and the command tool pushes to it which redistributes to connected workers.

You could do something complicated that would probably work, by effectively nominating one of the workers. For example, on startup workers attempt to bind() a PUB ipc:// socket somewhere accessible, like tmp. The one that wins bind()s a second IPC as a PULL socket and acts as a forwarder device on top of it's normal duties, the others connect() to the original IPC. The command line tool connect()s to the second IPC, and pushes it's message. The risk there is that the winner dies, leaving a locked file. You could identify this in the command line tool, rebind then sleep (to allow the connections to be established). Still, that's all a little bit complex, I think I'd go with a proxy!

0
votes

I think what you're describing would fit well with a gearmand/supervisord implementation.

Gearman is a great task queue manager and supervisord would allow you to make sure that the process(es) are all running. It's TCP based too so you could have clients/workers on different machines.

http://gearman.org/

http://supervisord.org/

I recently set something up with multiple gearmand nodes, linked to multiple workers so that there's no single point of failure

edit: Sorry - my bad, I just re-read and saw that this might not be ideal.

Redis has some nice and simple looking pub/sub functionality that I've not used yet but sounds promising.

0
votes

Use a mulitcast PUB/SUB. You'll have to make sure the pgm option is compiled into your ZeroMQ distribution (man 7 zmq_pgm).