2
votes

The supervisor:

-module(mod_guild_chapter_sup).
-include("guild_dungeon.hrl").

-behaviour(supervisor).

%% API
-export([start_link/0]).

%% Supervisor callbacks
-export([init/1]).

-define(SERVER, ?MODULE).

%%%===================================================================
%%% API functions
%%%===================================================================

%%--------------------------------------------------------------------
%% @doc
%% Starts the supervisor
%%
%% @spec start_link() -> {ok, Pid} | ignore | {error, Error}
%% @end
%%--------------------------------------------------------------------
start_link() ->
    supervisor:start_link({local, ?SERVER}, ?MODULE, []).

%%%===================================================================
%%% Supervisor callbacks
%%%===================================================================

%%--------------------------------------------------------------------
%% @private
%% @doc
%% Whenever a supervisor is started using supervisor:start_link/[2,3],
%% this function is called by the new process to find out about
%% restart strategy, maximum restart frequency and child
%% specifications.
%%
%% @spec init(Args) -> {ok, {SupFlags, [ChildSpec]}} |
%%                     ignore |
%%                     {error, Reason}
%% @end
%%--------------------------------------------------------------------
init([]) ->
    RestartStrategy = simple_one_for_one,
    MaxRestarts = 1000,
    MaxSecondsBetweenRestarts = 3600,

    SupFlags = {RestartStrategy, MaxRestarts, MaxSecondsBetweenRestarts},

    Restart = transient,
    Shutdown = 60000,
    Type = worker,

    ModGuildChapter = {'guild_chapter', {'mod_guild_chapter', start_link, []},
                       Restart, Shutdown, Type, ['mod_guild_chapter']},

    {ok, {SupFlags, [ModGuildChapter]}}.

The child:

-module(mod_guild_chapter).

-record(state, {}).

start_link(GuildId, ChapterId) ->
    gen_server:start_link(?MODULE, [GuildId, ChapterId], []).

init([GuildId, ChapterId]) ->
    case condition() of
        true -> ignore;
        false -> {ok, #state{})
    end.

...omit other callbacks...

supervisor:which_children(mod_guild_chapter_sup):

[{undefined,<0.9635.0>,worker,[mod_guild_chapter]},
 {undefined,<0.9539.0>,worker,[mod_guild_chapter]},
 {undefined,<0.9475.0>,worker,[mod_guild_chapter]},
 {undefined,<0.9493.0>,worker,[mod_guild_chapter]},
 {undefined,<0.9654.0>,worker,[mod_guild_chapter]},
 {undefined,undefined,worker,[mod_guild_chapter]},
 {undefined,<0.9658.0>,worker,[mod_guild_chapter]},
 {undefined,<0.9517.0>,worker,[mod_guild_chapter]},
 {undefined,<0.9567.0>,worker,[mod_guild_chapter]}]

The exception when received shutdown:

2015-07-03 14:56:33 =CRASH REPORT====
  crasher:
    initial call: mod_guild_chapter:init/1
    pid: <0.9475.0>
    registered_name: []
    exception exit: {{function_clause,[{supervisor,'-monitor_dynamic_children/2-fun-1-',[undefined,[100062,10003],{{set,3,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[<0.9658.0>],[],[],[<0.9517.0>],[],[<0.9567.0>],[],[],[]}}},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}}],[{file,"supervisor.erl"},{line,992}]},{dict,fold_bucket,3,[{file,"dict.erl"},{line,441}]},{dict,fold_seg,4,[{file,"dict.erl"},{line,437}]},{dict,fold_segs,4,[{file,"dict.erl"},{line,433}]},{supervisor,terminate_dynamic_children,3,[{file,"supervisor.erl"},{line,959}]},{gen_server,terminate,6,[{file,"gen_server.erl"},{line,719}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]},[{gen_server,terminate,6,[{file,"gen_server.erl"},{line,744}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
    ancestors: [mod_guild_chapter_sup,mod_guild_sup,yg_sup,<0.82.0>]
    messages: []
    links: []
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 6586
  neighbours:

As you can see, there is an undefined pid which should not be there.

There are two places in the document explaining this but have conflicts between them.

  1. The child specification will be kept by the supervisor if the child is not temporary even if it returns ignore.

start defines the function call used to start the child process. It must be a module-function-arguments tuple {M,F,A} used as apply(M,F,A).

The start function must create and link to the child process, and must return {ok,Child} or {ok,Child,Info} where Child is the pid of the child process and Info an arbitrary term which is ignored by the supervisor.

The start function can also return ignore if the child process for some reason cannot be started, in which case the child specification will be kept by the supervisor (unless it is a temporary child) but the non-existing child process will be ignored.

  1. simple_one_for_one pid will not added to the supervisor if returns ignore when start.

If the child process start function returns ignore, the child specification is added to the supervisor (unless the supervisor is a simple_one_for_one supervisor, see below), the pid is set to undefined and the function returns {ok,undefined}.

In the case of a simple_one_for_one supervisor, when a child process start function returns ignore the functions returns {ok,undefined} and no child is added to the supervisor.

I got confused by the document. I choose transient restart strategy because the child should restart when crashing. But how can I avoid this exception here?

1

1 Answers

0
votes

Ok, I read your question again. The documentation says that if you return ignore in the child's start function than this child is not started. For a simple_one_for_one supervisor the child specifications will also not be kept (which is logical cause there is just one specification saved at the start of the supervisor). This means that the line:

{undefined,undefined,worker,[mod_guild_chapter]},

just says that no child process has been started. Therefore the process cannot be the reason for your exception. The crash report gives you the pid of the crashed process which is <0.9475.0>. You can find it in the return list of which_children. This is the process to inspect to find the reason of your crash.