0
votes

MPI2 allows us to create derived data types and send them by writing

call mpi_type_create_indexed_block(size,1,dspl_send,rtype,DerType,ierr)
call mpi_send(data,1,DerType,jRank,20,comm,ierr)

By doing this the position dspl_send of data(N) are sent by the MPI library.

Now, for a matrix data(M,N) we can send its position via the following code:

call mpi_type_create_indexed_block(size,M,dspl_send,rtype,DerTypeM,ierr)
call mpi_send(data,1,DerTypeM,jRank,20,comm,ierr)

That is the entries data(i, dspl_send(j)) are sent.

My question concern the role of the 1 in the subsequent mpi_send. Does it has always to be 1? Is another size possible? MPI derived data types are explained nicely in many documents on the internet, but always the size in send/recv is 1 without mention if another size is allowed and then how it could be used.

If we want to work with matrices data(M,N) with a size M that varies between calls, do we need to always create a derived data type whenever we call it? Is it impossible to use DerType for sending a matrix data(M,N) or data(N,M)?

2

2 Answers

3
votes

Each MPI datatype has two properties: size and extent. The size is the actual number of bytes that the datatype represent while the extent is the number of bytes that the datatype covers in memory. Some datatypes are not contiguous, which means that their size might be less than their extent, e.g. (shown here in pseudocode)

MPI_TYPE_VECTOR(count = 1,
                blocklength = 10,
                stride = 20,
                oldtype = MPI_INTEGER,
                newtype = newtype)

creates a datatype that takes the first 10 (blocklength) elements from a total of 20 (stride). This datatype has a size of 10 times the size of MPI_INTEGER which counts to 40 bytes on most systems. Its extent is two times larger or 80 bytes on most systems. If count was 2, then it would take 10 elements, then skip the next 10, then take another 10 elements and once again skip the next 10. Consequently its size and its extend would be twice as larger.

When you specify a certain element count in any MPI routine, e.g. MPI_SEND, MPI does something like this:

  1. It initialises the internal data buffer with the address of the source buffer argument.
  2. It consults the datatype type map to decide how many bytes and from where to take and appends them to the message being constructed. The number of bytes added equals the size of the datatype.
  3. It increments the internal data pointer by the extent of the datatype.
  4. It decrements the internal count and if it is still non-zero, repeats the previous two steps.

One nifty feature of MPI is that the extent of the datatype is not required to match its size (as shown in the vector example) and one can even bestow whatever value of the extent that he wants on the datatype using MPI_TYPE_CREATE_RESIZED. This allows for very complex data access patterns to be created. For example, using MPI_SCATTERV to scatter a matrix by blocks that do not span entire rows (C) or columns (Fortran) requires the use of such resized types.

Back to the vector example. Whether you create a vector type with count = 1 and then call MPI_SEND with count = 2 or you create a vector type with count = 2 and then call MPI_SEND with count = 1, the end result is the same. Often one constructs a datatype that fully describes the object that one wants to send. In this case one gives count = 1 in the call to MPI_SEND. But there are cases when it might be more beneficial to create a datatype that describes only a portion of the object, for example a single part, and then call MPI_SEND with count set to the number of parts that one wants to send. Sometimes it is a matter of personal preferences, sometimes it is a matter of algorithmic requirements.

As to your last question, Fortran stores matrices in column-major order, which means that data(i,j) is next to data(i±1,j) in memory and not to data(i,j±1). Consequently, data(M,N) consists of N consecutive column-vectors of M elements each. The distance between two elements, for example data(1,1) and data(1,2) depends on M. That's why you supply M in the type constructor. Matrices with different number of rows (e.g. different M) would not "fit" the type map of the created type and the wrong elements would be used to construct the message.

0
votes

The description about extent in https://stackoverflow.com/a/13802243/7784768 is not entirely correct, as the extent does not take into account the padding in the end of datatype. MPI datatypes are defined by typemap:

typemap = ((type_0, disp_0 ), ..., (type_n−1, disp_n−1 ))

Extent is then defined according to

lb = min(disp_j)
ub = max(disp_j + sizeof(type_j)) + e)
extent = ub - lb,

where e can be non-zero due alignment requirements.

This means that in the example

MPI_TYPE_VECTOR(count = 1,
                blocklength = 10,
                stride = 20,
                oldtype = MPI_INTEGER,
                newtype = newtype)

with count=1, typemap is

((int, 0), (int, 4), ... (int, 36))

and extent is in most systems 40 and not 80 (i.e. stride has no effect for the typemap in this case). For count=2, typemap would be

((int, 0), (int, 4), ... (int, 36), (int, 80), (int, 84), ... (int, 116))

and extent 120 (40 bytes for the first block of 10 integers, 40 bytes for the stride, and 40 bytes for the second block of 10 integers, but the remaining stride is neglected in the extent). One can easily find out the extent with the MPI_Type_get_extent function.

Extent is quite tricky concept, and it is easy to make mistakes when trying to communicate multiple elements of derived datatype.