In MPI, each rank has a unique address space and communication between them happens via message passing.
I want to know how MPI works on a multicore machine which has a shared memory. If the ranks are on two different machines with no shared memory, then MPI has to use messages for communication. But if ranks are on the same physical machine (but still each rank has a different address space), will the MPI calls take advantage of the shared memory?
For example, suppose I'm issuing an ALLREDUCE call. I have two machines M1 and M2, each with 2 cores. Rank R1 and R2 are on core 1 and core 2 of machine M1 and R3 and R4 are on core 1 and 2 of machine M2. How would the ALLREDUCE happen? Will there be more than 1 message transmitted? Ideally, I would expect R1 and R2 to do a reduce using the shared memory available to them (similarly R3 andR4) followed by message exchange between M1 and M2.
Is there any documentation where I can read about the implementation details of the collective operations in MPI?