I used MPI_Isend to transfer an array of chars to slave node. When the size of the array is small it worked, but when I enlarge the size of the array, it hanged there.
Code running on the master node (rank 0) :
MPI_Send(&text_length,1,MPI_INT,dest,MSG_TEXT_LENGTH,MPI_COMM_WORLD);
MPI_Isend(text->chars, 360358,MPI_CHAR,dest,MSG_SEND_STRING,MPI_COMM_WORLD,&request);
MPI_Wait(&request,&status);
Code running on slave node (rank 1):
MPI_Recv(&count,1,MPI_INT,0,MSG_TEXT_LENGTH,MPI_COMM_WORLD,&status);
MPI_Irecv(host_read_string,count,MPI_CHAR,0,MSG_SEND_STRING,MPI_COMM_WORLD,&request);
MPI_Wait(&request,&status);
You see the count param in MPI_Isend is 360358. It seemed too large for MPI. When I set the param 1024, it worked well.
Actually this problem has confused me a few days, I have known that there's limit on the size of data transferred by MPI. But as far as I know, the MPI_Send is used to send short messages, and the MPI_Isend can send larger messages. So I use MPI_Isend.
The network configure in rank 0 is:
[[email protected] ~]$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:1B:21:D9:79:A5
inet addr:192.168.0.101 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:393267 errors:0 dropped:0 overruns:0 frame:0
TX packets:396421 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:35556328 (33.9 MiB) TX bytes:79580008 (75.8 MiB)
eth0.2002 Link encap:Ethernet HWaddr 00:1B:21:D9:79:A5
inet addr:10.111.2.36 Bcast:10.111.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:133577 errors:0 dropped:0 overruns:0 frame:0
TX packets:127677 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:14182652 (13.5 MiB) TX bytes:17504189 (16.6 MiB)
eth1 Link encap:Ethernet HWaddr 00:1B:21:D9:79:A4
inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:206981 errors:0 dropped:0 overruns:0 frame:0
TX packets:303185 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:168952610 (161.1 MiB) TX bytes:271792020 (259.2 MiB)
eth2 Link encap:Ethernet HWaddr 00:25:90:91:6B:56
inet addr:10.111.1.36 Bcast:10.111.1.255 Mask:255.255.254.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:26459977 errors:0 dropped:0 overruns:0 frame:0
TX packets:15700862 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:12533940345 (11.6 GiB) TX bytes:2078001873 (1.9 GiB)
Memory:fb120000-fb140000
eth3 Link encap:Ethernet HWaddr 00:25:90:91:6B:57
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Memory:fb100000-fb120000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1894012 errors:0 dropped:0 overruns:0 frame:0
TX packets:1894012 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:154962344 (147.7 MiB) TX bytes:154962344 (147.7 MiB)
The network configure in rank 1 is:
[[email protected] ~]$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:1B:21:D9:79:5F
inet addr:192.168.0.102 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:328449 errors:0 dropped:0 overruns:0 frame:0
TX packets:278631 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:47679329 (45.4 MiB) TX bytes:39326294 (37.5 MiB)
eth0.2002 Link encap:Ethernet HWaddr 00:1B:21:D9:79:5F
inet addr:10.111.2.37 Bcast:10.111.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:94126 errors:0 dropped:0 overruns:0 frame:0
TX packets:53782 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:8313498 (7.9 MiB) TX bytes:6929260 (6.6 MiB)
eth1 Link encap:Ethernet HWaddr 00:1B:21:D9:79:5E
inet addr:192.168.1.102 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:121527 errors:0 dropped:0 overruns:0 frame:0
TX packets:41865 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:158117588 (150.7 MiB) TX bytes:5084830 (4.8 MiB)
eth2 Link encap:Ethernet HWaddr 00:25:90:91:6B:50
inet addr:10.111.1.37 Bcast:10.111.1.255 Mask:255.255.254.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:26337628 errors:0 dropped:0 overruns:0 frame:0
TX packets:15500750 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:12526923258 (11.6 GiB) TX bytes:2032767897 (1.8 GiB)
Memory:fb120000-fb140000
eth3 Link encap:Ethernet HWaddr 00:25:90:91:6B:51
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Memory:fb100000-fb120000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:1895944 errors:0 dropped:0 overruns:0 frame:0
TX packets:1895944 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:154969511 (147.7 MiB) TX bytes:154969511 (147.7 MiB)
MPI_SendorMPI_Isendfor either large or small messages. Usually the problem is that the send and receive calls aren't matching up correctly. This is especially true if things work for small messages and not for large ones. - Wesley Bland