2
votes

Imagine we have the following problem:

  1. We have http clients that execute requests to our software. So we have one process that is always available to them and stores their requests in a queue.
  2. We need to dispatch these requests to a machine that is in our internal network (again via HTTP).
  3. Such a machine is not always available. It is started (and stopped when the queue is empty) on demand by our software (again HTTP request to a "manager" machine).
  4. We have several (or lots) of the above.

So basically, we have one logical entity, that for the sake of argument, we will call a "job queue". Every job queue consists of several (heterogenous) processes. One that implements the actual queue and is always available (doesn't block). One that manages a worker machine. We also have several (spawned on demand) workers, that take entries off the queue, try to send them to the worker machine, work around errors; maybe return (unsuccessful) attempts to the queue (to be retried) etc. And we maybe also have a "manager" process that coordinates the work of the above. And we have lots of "job queues" who all consist of lots of processes.

NOTE: this may not be the perfect solution to this exact problem, but let's assume that it is. My question is not about how to solve the problem, but how to manage such "groups" of processes that represent logical entities.

So, how do you represent this in OTP? How many supervision trees do you have, do you share supervisors between "job queue" entities, or do you have a supervisor per logical entity. Also, how do you manage the whole thing.

I have a guess, but this is quite a tricky problem (as I already tried implementing it in several different ways), so I won't share my (maybe not so bad) idea (for now).

2

2 Answers

1
votes

I would use dedicated supervisor for each logical component (I guess you mean by logical: http-workers, manager, dispatchers). Each of those would have supervisor over one of those classes. I like it, because I can benefit from additional tools to control it (count children, see it in i(). etc.) and it nicely separates the system.

Gproc mentioned by @MinimeDJ and sync/async stuff is completely different thing.

I think it is not the best architecture if you need in system you described to use gproc. Redesign it to have as much as possible stateless layers. E.g. in stead of maintaining dispatchers = push model, try pull model = pull tasks from back-end machine. This solution makes queues stateless, you get rid of dispatchers and if anything goes wrong backend layer puts task again in some queue. Moreover Managers are just reduced to API to queues and some stats collectors. Load of back-end workers is measured and controlled (localy!) in each of those heterogeneous back-end modules.

0
votes

From very above we also have a system that consists of many special blocks and our first architecture was something similar to yours. Instead of HTTP we used RabbitMQ which I believe much more convenient in terms of messages exchange.

But before the final release we understood that it will be a real challenge to maintain the whole system in production.

So, we redesigned it again. Now we represent each logical block as a process gen_server. Each process has a unique name and lives in gproc. Since gproc can live on many nodes we have very easy to manage fault tolerant system.

So, I would say, that we have Manageable Object Model (we call it MOM coz we really love it).

So, for me your system seems to be overcomplicated. I don't know if my answer is useful at all, but sometimes it worth to think about your system in a way you never expected at the beginning. I hope you will find a way to manage it in an easy way.