I'm trying to understand what's happening here:
I have a supervisor that is cyclically restarting one client without triggering the MaxR, MaxT
mechanism. The client just crashes slowly enough never to trigger the rate limitation.
There would have been another mechanism that uses supervisor:which_children/1
and delete_child/2, start_child/2
to adapt the set of children to reality (its scanning for USB devices trying to have one supervisor child per device found).
This would normally behave like a safety net to the rate limitation, but strangely it looks like the mechanism that deletes and starts children is not called at all.
To find out what's going on I called supervisor:which_children/1
from the shell and it looks like the call just blocks and never returns.
Can it be that calls to the supervisor are blocked while it is busy trying to restart a child?
Addendum:
it looks like the crash happens during child start:
=SUPERVISOR REPORT==== 29-Mar-2011::21:36:20 ===
Supervisor: {local,gateway_sup}
Context: start_error
Reason: {'EXIT',{timeout,{gen_server,call,[<0.155.0>,late_init]}}}
Offender: [{pid,<0.76.0>},
{name,gw_3_5},
{mfa,{channel,start_link,
[[{gateways,[{left,108},{right,103}]}],
{3,5}]}},
{restart_type,transient},
{shutdown,10000},
{child_type,worker}]
gen_server:call
in thestart_link
function of the child? – Adam Lindberginit
function instead? Seems that there may be risk for dead lock here... – Adam Lindbergself()
in thegen_server
process to get it's own pid. – Adam Lindberg