Is it wrong to call MPI_Bcast several times?

Question

I'm trying to implement Matrix-Vector multiplication using MPI (i.e. nxn matrix multiplied by nx1 vector).

Originally, I decided to use multiple MPI_Bcast calls (before I noticed MPI_AllGather...) and I stumbled upon some strange behaviour. Apparently, data can be received no matter what rank is passed to MPI_Bcast call.

Part of the code used (the functions are called right after each other, so sending broadcast is before receiving broadcasts). Prints are just for debugging purposes, where I know testing data is of length 2:

class Processor
{
public:
    Processor(int rank, int communicatorSize);

private:
    void broadcastOwnVectorToOtherRanks();
    void receiveBroadcastsFromOtherRanks();
    //...

    int ownRank;
    int communicatorSize;
    std::vector<int> ownVectorPart;
    std::vector<int> totalVector;
    //...
};

void Processor::broadcastOwnVectorToOtherRanks()
{
    //ownVectorPart is correctly filled before this function call
    std::printf("Own data in vector %d %d\n", ownVectorPart[0], ownVectorPart[1]);
    MPI_Bcast(ownVectorPart.data(), ownVectorPart.size(), MPI_INT, ownRank, MPI_COMM_WORLD);
}

void Processor::receiveBroadcastsFromOtherCommunicators()
{
    for (int rank = 0; rank < communicatorSize; ++rank)
    {
        if (rank == ownRank)
        {
            totalVector.insert(totalVector.end(), ownVectorPart.begin(), ownVectorPart.end());
        }
        else
        {
            std::vector<int> buffer(ownVectorPart.size());
            MPI_Bcast(buffer.data(), ownVectorPart.size(), MPI_INT, rank, MPI_COMM_WORLD);
            std::printf("Received from process with rank %d: %d %d\n", rank, buffer[0], buffer[1]);
            totalVector.insert(totalVector.end(), buffer.begin(), buffer.end());
        }
    }
}

Outcome (sorted by rank):

[0] Own data in vector 0 1
[0] Received from communicator 1: 6 7
[0] Received from communicator 2: 4 5
[0] Received from communicator 3: 2 3
[1] Own data in vector 2 3
[1] Received from communicator 0: 0 1
[1] Received from communicator 2: 4 5
[1] Received from communicator 3: 6 7
[2] Own data in vector 4 5
[2] Received from communicator 0: 0 1
[2] Received from communicator 1: 2 3
[2] Received from communicator 3: 6 7
[3] Own data in vector 6 7
[3] Received from communicator 0: 4 5
[3] Received from communicator 1: 2 3
[3] Received from communicator 2: 0 1

As you can see, in process with rank 0 and 3 data received is different than data sent. For example, process with rank 0 received data from rank 3, even though it is expecting data from process 1.

It seems to me that rank is disregarded when receiving broadcast data, and MPI assigns data as it comes, whether it was from expected rank or not.

Why does MPI_Bcast receive data from process with rank 3, when rank passed as argument is 1? Is calling MPI_Bcast multiple times at the same time an undefined behaviour? Or is there a bug in my code?

You are mixing communicators with root rank. In your snippet, there is only one communicator and it is MPI_COMM_WORLD. All the ranks of the communicator must invoke MPI_Bcast() with the same root rank, and I suspect that is not the case in your program, and hence the undefined behavior. Please upload a minimal reproducible example if you need more help. — Gilles Gouaillardet
@GillesGouaillardet Yes, sorry for the confusion about communicators and ranks. My problem is, all ranks are trying to broadcast data. Firstly, each one sends single broadcast (in broadcastOwnVectorToOtherCommunicators()), secondly every rank tries to receive broadcasts from all other ranks (receiveBroadcastsFromOtherCommunicators()). Multiple brodcasts are being exchanged at the same time. I'll edit the question to try and clear it up. — Yksisarvinen
And yes, all broadcasts are received. e.g. process with rank 3 will send one broadcast and receive total of 3 broadcasts from 0, 1 and 2. But the received data is mixed up, as if the rank argument in MPI_Bcast was ignored. — Yksisarvinen
Once again, you are likely misusing MPI_Bcast(). All the ranks of the communicator must invoke the collective subroutine with the same root argument. I can only confirm it if you edit your question with a minimal reproducible example — Gilles Gouaillardet
@GillesGouaillardet What is missing from Complete example? main()? PBS file for qsub? I could refactor it to avoid classes to try to simplify it? I'm trying to explain that all processes will call MPI_Bcast with the given root eventually. All processes will at some point call it with root == 0, with root == 1, etc. If this is not allowed, that's all I'm asking about. — Yksisarvinen

rtoijala rtoijala · Accepted Answer · 2019-05-23T15:13:05

Quoting the MPI 3.1 standard (section 5.12):

All processes must call collective operations (blocking and nonblocking) in the same order per communicator. In particular, once a process calls a collective operation, all other processes in the communicator must eventually call the same collective operation, and no other collective operation with the same communicator in between.

Combine this with section 5.4:

If comm is an intracommunicator, MPI_BCAST broadcasts a message from the process with rank root to all processes of the group, itself included. It is called by all members of the group using the same arguments for comm and root.

I interpret these two sections to mean that you must call MPI_Bcast and similar collective communication functions in the same order with the same arguments on all processes. Calling with different root values is invalid.

I believe MPI_Allgather is more suited to the communication you seem to want. It gathers equal amounts of data from all processes and copies it to each process.

Is it wrong to call MPI_Bcast several times?

1 Answers