Distributed Erlang: multicall exceeds requested timeout

Question

We use distributed erlang cluster and now I tests it in case of net splits.

To get information from all nodes of the cluster I use gen_server:multicall/4 with defined timeout. What I need is to get information from available nodes as soon as possible. So timeout is not too big (about 3000 ms). Here call example:

Timeout = 3000
Nodes = AllConfiguredNodes
gen_server:multi_call(Nodes, broker, get_score, Timeout)

I expect that this call returns result in Timeout ms. But in case of net split it does not. It waits approx. 8 seconds.

What I found that multi_call request is halted for additional 5 seconds in call erlang:monitor(process, {Name, Node}) before sending request.

I really do not care that some node do not reply or busy or not available, I can use any other but with this halting I forced to wait until Erlang VM try to establish new connection to dead/not available node.

The question is: do you know solution that can prevent this halting? Or may be another RPC that suitable for my situation.

Jr0 Jr0 · Accepted Answer · 2017-11-16T18:01:46

I'm not sure if I totally understand the problem you are trying to solve, but if it is to get all the answers that can be retrieved in X amount of time and ignore the rest, you might try a combination of async_call and nb_yield.

Maybe something like

somefun() ->
    SmallTimeMs = 50,
    Nodes = AllConfiguredNodes,
    Promises = [rpc:async_call(N, some_mod, some_fun, ArgList) || N <- Nodes],
    get_results([], Promises, SmallTimeMs).


get_results(Results, _Promises, _SmallTimeMs) when length(Results) > 1 ->   % Replace 1 with whatever is the minimum acceptable number of results
    lists:flatten(Results);
get_results(Results, Promises, SmallTimeMs) ->
    Rs = get_promises(Promises, SmallTimeMs)
    get_results([Results|Rs], Promises, SmallTimeMs)).


get_promise(Promises, WaitMs) ->
    [rpc:nb_yield(Key, WaitMs) || Key <- Promises].

See: http://erlang.org/doc/man/rpc.html#async_call-4 for more details.

Distributed Erlang: multicall exceeds requested timeout

2 Answers