Stacked MPI derived data types in Fortran

Question

MPI2 allows us to create derived data types and send them by writing

call mpi_type_create_indexed_block(size,1,dspl_send,rtype,DerType,ierr)
call mpi_send(data,1,DerType,jRank,20,comm,ierr)

By doing this the position dspl_send of data(N) are sent by the MPI library.

Now, for a matrix data(M,N) we can send its position via the following code:

call mpi_type_create_indexed_block(size,M,dspl_send,rtype,DerTypeM,ierr)
call mpi_send(data,1,DerTypeM,jRank,20,comm,ierr)

That is the entries data(i, dspl_send(j)) are sent.

My question concern the role of the 1 in the subsequent mpi_send. Does it has always to be 1? Is another size possible? MPI derived data types are explained nicely in many documents on the internet, but always the size in send/recv is 1 without mention if another size is allowed and then how it could be used.

If we want to work with matrices data(M,N) with a size M that varies between calls, do we need to always create a derived data type whenever we call it? Is it impossible to use DerType for sending a matrix data(M,N) or data(N,M)?

Hristo Iliev Hristo Iliev · Accepted Answer · 2012-12-10T13:47:53

Each MPI datatype has two properties: size and extent. The size is the actual number of bytes that the datatype represent while the extent is the number of bytes that the datatype covers in memory. Some datatypes are not contiguous, which means that their size might be less than their extent, e.g. (shown here in pseudocode)

MPI_TYPE_VECTOR(count = 1,
                blocklength = 10,
                stride = 20,
                oldtype = MPI_INTEGER,
                newtype = newtype)

creates a datatype that takes the first 10 (blocklength) elements from a total of 20 (stride). This datatype has a size of 10 times the size of MPI_INTEGER which counts to 40 bytes on most systems. Its extent is two times larger or 80 bytes on most systems. If count was 2, then it would take 10 elements, then skip the next 10, then take another 10 elements and once again skip the next 10. Consequently its size and its extend would be twice as larger.

When you specify a certain element count in any MPI routine, e.g. MPI_SEND, MPI does something like this:

It initialises the internal data buffer with the address of the source buffer argument.
It consults the datatype type map to decide how many bytes and from where to take and appends them to the message being constructed. The number of bytes added equals the size of the datatype.
It increments the internal data pointer by the extent of the datatype.
It decrements the internal count and if it is still non-zero, repeats the previous two steps.

One nifty feature of MPI is that the extent of the datatype is not required to match its size (as shown in the vector example) and one can even bestow whatever value of the extent that he wants on the datatype using MPI_TYPE_CREATE_RESIZED. This allows for very complex data access patterns to be created. For example, using MPI_SCATTERV to scatter a matrix by blocks that do not span entire rows (C) or columns (Fortran) requires the use of such resized types.

Back to the vector example. Whether you create a vector type with count = 1 and then call MPI_SEND with count = 2 or you create a vector type with count = 2 and then call MPI_SEND with count = 1, the end result is the same. Often one constructs a datatype that fully describes the object that one wants to send. In this case one gives count = 1 in the call to MPI_SEND. But there are cases when it might be more beneficial to create a datatype that describes only a portion of the object, for example a single part, and then call MPI_SEND with count set to the number of parts that one wants to send. Sometimes it is a matter of personal preferences, sometimes it is a matter of algorithmic requirements.

As to your last question, Fortran stores matrices in column-major order, which means that data(i,j) is next to data(i±1,j) in memory and not to data(i,j±1). Consequently, data(M,N) consists of N consecutive column-vectors of M elements each. The distance between two elements, for example data(1,1) and data(1,2) depends on M. That's why you supply M in the type constructor. Matrices with different number of rows (e.g. different M) would not "fit" the type map of the created type and the wrong elements would be used to construct the message.

Stacked MPI derived data types in Fortran

2 Answers