3
votes

I'm working with Elixir but I believe this question applies to Erlang as well.

I'm working on a system which might create dozens of thousands of groups processes of same kind. Each group will have 2 workers and a local supervisor of its own. The question is who will supervise the local supervisors ?

I can imagine two strategies

  1. one big supervisor that will handle all local supervisors. This method is simple yet I believe the supervisor will need to traverse its huge list of children whenever something happens to a child which will be a heavy operation.
  2. a partitioned tree. Say for example a set of intermediate supervisors supervising about 1000 local supervisors, then a global supervisor handling the intermediate ones. To create a new group, the global supervisor will need to find the intermediate supervisor with least children and delegate to that one the creation.

Does either make sense or is there any other way? Any advice is welcome

3

3 Answers

3
votes

"It depends".

"huge list" and "thousands" really are in different realms. Simple iteration is fast on modern machines. Up to high five, low six items I would have no qualms with a system that regularly has to traverse a list this size, and probably over that I wouldn't really care either:

iex(2)> list = Enum.to_list 1..1_000_000; :timer.tc(fn -> Enum.sum list end) {24497, 500000500000}

(that is 25 ms for the list traversal and some arithmetic - I'm usually happy if a crashed process gets restarted with such small delays)

Of course - at the end of the day you're expected to do your own performance testing, compare the outcomes with the expected local supervisor crash rate, look up your system's requirements, and compare all these figures to come to an answer.

In the meantime, use the simplest thing that can possibly work: a single global supervisor monitoring a flat hierarchy.

0
votes

The approach one is perfectly efficient. The global supervisor would not need to traverse anything as soon as any subgroup has it’s own local supervisor and the latter it not intended to crash.

When something will happen with the leaf worker, this local supervisor will take care about restarting it, and the global supervisor wouldn’t even know that something wrong happened there down in the tree.

If, OTOH, you expect your local supervisors to be crashed from time to time on purpose, each local supervisor should be supervised with it’s own, say, intermediate supervisor, which will take care of it’s restarts. The global supervisor will in this case manage these intermediate supervisors, and everything will be cool again.

0
votes

Use director with ETS mode and don't worry about number of children. In ETS mode, you can read some info about children directly from Table too.