0
votes

I'm trying to parallelize my calculations with rpc:pmap. But I'm bit confused with its performance.

Here is simple example:

-module(my_module).
-compile(export_all).

    do_apply( X, F ) -> F( X ).

First of all - test on single node:

1> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X) -> timer:sleep(10), X end], lists:seq(1,10000)] ).
{208198,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

After that I've connected second node (second erlang shell process in my OS):

([email protected])24> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X) -> timer:sleep(10), X end], lists:seq(1,10000)] ).
{446284,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

Finally I've connected third node:

([email protected])26> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X) -> timer:sleep(10), X end], lists:seq(1,10000)] ).
{483399,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

So - I've got worse performance with three nodes vs. single node.

I'm realize that there is some overhead for communication between nodes. But how can I understand in which cases is better to perform calculations on multiple nodes?

Edit:

My step-by-step test from shell:

1> c(my_module).
{ok,my_module}
2>  
2> List = lists:seq(1,10000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]

Test performance on single node:

3> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X)-> timer:sleep(10), X end], List] ).
{207346,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

Entrance to network environment:

4> net_kernel:start([one]).
{ok,<0.20066.0>}
([email protected])5> erlang:set_cookie(node(), foobar).
true

Add second node:

([email protected])6> net_kernel:connect('[email protected]').
true
([email protected])7> 
([email protected])7> nodes().
['[email protected]']

Test performance with two nodes:

([email protected])8> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X)-> timer:sleep(10), X end], List] ).
{510733,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

Connect third node:

([email protected])9> net_kernel:connect('[email protected]').
true
([email protected])10> nodes().
['[email protected]',
 '[email protected]']

Test performance with three nodes:

([email protected])11> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X)-> timer:sleep(10), X end], List] ).
{496278,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

P.S. I guess that performance decreases because I'm creating each node as a new erlang-shell process in the same physical machine. But I don't know exactly if I'm right.

1
Can you try to generate the list before calling timer:tc? e.g. Seq=lists:seq(1,10000), timer:tc(..., Seq).Isac
@Isac yes, I've tried that, but got similar results. I've edited my question with description of step-by-step test in shell.stemm

1 Answers

3
votes

You don't need to add nodes to get parallelism in Erlang. Each node can support large numbers of processes locally. pmap is already running your function in parallel. This is easier to see if you make the wait longer:

timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X) -> timer:sleep(1000), X end], lists:seq(1,10000)] ).
{1158174,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

If the sleeps were running sequentially on one node, then you would expect a minimum wait of 1000 * 10000 = 10,000,000, and we only had to wait 1,158,174

You are creating 3 separate Erlang VMs, and connecting them to each other. Then, you are running parallel map on one of those VMs. The additional VMs will only hurt your performance with your current setup, since they are all trying to use the same physical resources, and 2 of them aren't even running any of the work.

Multiple nodes will only help performance if they are run on different physical resources.