What you probably overlook is the famous “let it crash” philosophy, which makes process crashes and restarts the first-class citizen in OTP. We don’t treat process crashes as failures, but rather as an opportunity to redo it properly without the necessity to manually handle errors.
The main reason is to allow more grained control on what should have been restarted on failure. For that, we have strategies
. Or, as @Andree restated it in comments:
by organizing supervisions in hierarchies, we allow finer-grained control over how the system should respond should a subset of the system fails
Imagine the application that has a process responsible for a remote connection, and a bunch of processes, all using this resource. When the connection process crashes, it’s, in any case, being restarted by its supervisor but its pid
changes. Meaning all the process that relied on this pid
should have been restarted as well. With :rest_for_one
strategy it’s easy out of the box.
Another approach to this particular example would be to manage a connection in a process, supervised in another part of the tree, and upon connection issues manually crash the supervisor of pools using this connection to reinitialize all of them.
Even more, we might want to manually crash the process handling this connection to reinitialize it, instead of writing defensive code like if no_conn, do: reload_config_and_restart_connection
we just let it crash and get reinitialized by the supervision tree with new proper config.
Last but not least, if the supervisor does not trap exits, it would crash as well, propagating it up. That way we might reinitialize the whole branch of supervision tree without writing a line of code.