2
votes

I am just learning OpenMPI. Tried a simple MPI_Scatter example:

#include <mpi.h>

using namespace std;

int main() {
    int numProcs, rank;

    MPI_Init(NULL, NULL);
    MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    int* data;
    int num;

    data = new int[5];
    data[0] = 0;
    data[1] = 1;
    data[2] = 2;
    data[3] = 3;
    data[4] = 4;
    MPI_Scatter(data, 5, MPI_INT, &num, 5, MPI_INT, 0, MPI_COMM_WORLD);
    cout << rank << " recieved " << num << endl; 

    MPI_Finalize();
    return 0;
}

But it didn't work as expected ...

I was expecting something like

0 received 0
1 received 1 
2 received 2 ... 

But what I got was

32609 received 
1761637486 received 
1 received 
33 received 
1601007716 received 

Whats with the weird ranks? Seems to be something to do with my scatter? Also, why is the sendcount and recvcount the same? At first I thought since I'm scattering 5 elements to 5 processors, each will get 1? So I should be using:

MPI_Scatter(data, 5, MPI_INT, &num, 1, MPI_INT, 0, MPI_COMM_WORLD);

But this gives an error:

[JM:2861] *** An error occurred in MPI_Scatter
[JM:2861] *** on communicator MPI_COMM_WORLD
[JM:2861] *** MPI_ERR_TRUNCATE: message truncated
[JM:2861] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

I am wondering though, why doing I need to differentiate between root and child processes? Seems like in this case, the source/root will also get a copy? Another thing is will other processes run scatter too? Probably not, but why? I thought all processes will run this code since its not in the typical if I see in MPI programs?

if (rank == xxx) {

UPDATE

I noticed to run, send and receive buffer must be of same length ... and the data should be declared like:

int data[5][5] = { {0}, {5}, {10}, {3}, {4} };

Notice the columns is declared as length 5 but I only initialized 1 value? What is actually happening here? Is this code correct? Suppose I only want each process to receive 1 value only.

1
Try MPI_Scatter(data, 1, MPI_INT, &num, 1, MPI_INT, 0, MPI_COMM_WORLD);, since the sendcount is the number of elements you want to send to EACH process, not the count of elements in send buffer.nhahtdh
@nhahtdh 's comment is the correct answer; it's a common confusion with the scatter/gather operations for people using them for the first time.Jonathan Dursi
@nhahtdh, maybe u can add an answer then I can mark it as answer :)Jiew Meng

1 Answers

5
votes

sendcount is the number of elements you want to send to each process, not the count of elements in the send buffer. MPI_Scatter will just take sendcount * [number of processes in the communicator] elements from the send buffer from the root process and scatter it to all processes in the communicator.

So to send 1 element to each of the processes in the communicator (assume there are 5 processes), set sendcount and recvcount to be 1.

MPI_Scatter(data, 1, MPI_INT, &num, 1, MPI_INT, 0, MPI_COMM_WORLD);

There are restrictions on the possible datatype pairs and they are the same as for point-to-point operations. The type map of recvtype should be compatible with the type map of sendtype, i.e. they should have the same list of underlying basic datatypes. Also the receive buffer should be large enough to hold the received message (it might be larger, but not smaller). In most simple cases, the data type on both send and receive sides are the same. So sendcount - recvcount pair and sendtype - recvtype pair usually end up the same. An example where they can differ is when one uses user-defined datatype(s) on either side:

MPI_Datatype vec5int;

MPI_Type_contiguous(5, MPI_INT, &vec5int);
MPI_Type_commit(&vec5int);

MPI_Scatter(data, 5, MPI_INT, local_data, 1, vec5int, 0, MPI_COMM_WORLD);

This works since the sender constructs messages of 5 elements of type MPI_INT while each receiver interprets the message as a single instance of a 5-element integer vector.

(Note that you specify the maximum number of elements to be received in MPI_Recv and the actual amount received might be less, which can be obtained by MPI_Get_count. In contrast, you supply the expected number of elements to be received in recvcount of MPI_Scatter so error will be thrown if the message length received is not exactly the same as promised.)

Probably you know by now that the weird rank printed out is caused by stack corruption, since num can only contains 1 int but 5 int are received in MPI_Scatter.

I am wondering though, why doing I need to differentiate between root and child processes? Seems like in this case, the source/root will also get a copy? Another thing is will other processes run scatter too? Probably not, but why? I thought all processes will run this code since its not in the typical if I see in MPI programs?

It is necessary to differentiate between root and other processes in the communicator (they are not child process of the root since they can be in a separate computer) in some operations such as Scatter and Gather, since these are collective communication (group communication) but with a single source/destination. The single source/destination (the odd one out) is therefore called root. It is necessary for all the processes to know the source/destination (root process) to set up send and receive correctly.

The root process, in case of Scatter, will also receive a piece of data (from itself), and in case of Gather, will also include its data in the final result. There is no exception for the root process, unless "in place" operations are used. This also applies to all collective communication functions.

There are also root-less global communication operations like MPI_Allgather, where one does not provide a root rank. Rather all ranks receive the data being gathered.

All processes in the communicator will run the function (try to exclude one process in the communicator and you will get a deadlock). You can imagine processes on different computer running the same code blindly. However, since each of them may belong to different communicator group and has different rank, the function will run differently. Each process knows whether it is member of the communicator, and each knows the rank of itself and can compare to the rank of the root process (if any), so they can set up the communication or do extra actions accordingly.