Additional MPI Notes


Aliasing

You must not alias input and output arguments in MPI communications. So, for example, in MPI_REDUCE() the same argument cannot be used for both sendbuf and recvbuf.


Gathering Variants

The MPI_GATHER() function has the calling sequence
MPI_Gather(void *sendbuf,         /* starting address of send buffer   */
           int sendcount,         /* number of elements in send buffer */
           MPI_Datatype sendtype, /* data type of send buffer elements */ 
           void *recvbuf,         /* address of receive buffer         */
           int recvcount,         /* number of elements for any single
                                     receive                           */
           MPI_Datatype recvtype, /* data type of recv buffer elements */
           int root,              /* rank of reciving process          */
           MPI_Comm comm)         /* communicator group                */
Almost always recvtype and sendtype are the same, and sendcount and recvcount will be equal. The last two specify the number of elements from each process, not the total amount of data received. This means, e.g., if sendcount = recvcount = 1, then p data items will be accumulated in recvbuf. They will also be accumulated in rank order.

MPI_Allgather() does the similar thing, except that the data is accumulated into recvbuf on every process in the communicator. One shortcoming with MPI_Allgather() is the sendcount must be identical on every process. For an iterative linear solver for a linear system of order n where mod(n,p) ≠ 0 one or more segments of the gathered vector must be padded so that they all have the same size. To handle this, there is a vector variant:

MPI_Gatherv(void *sendbuf,
            int sendcount,
            MPI_Datatype sendtype,
            void *recvbuf,       
            int *recvcounts,    
            int *displs,
            MPI_Datatype recvtype,
            int root,            
            MPI_Comm comm)      
Here recvcount has been replaced by a vector of recvcounts, such that recvcounts[i] contains the number of entries sent by process i. It also allows placing the data almost wherever in the recvbuf you desire, using the displs[] array. The data sent from process i is placed in the ith portion of the receive buffer recvbuf on process root. The ith part of recvbuf begins at offset displs[i] elements into recvbuf. The receive buffer is ignored for all non-root processes, but should be provided.

The "All" version of MPI_Gatherv is straightforward, given the above ideas.

Warning: This operation must not allow any overlapping of writes in the recvbuf on root (which is a form of aliasing). This prohibits a specification of counts, types, and displacements that allow two different processes to have part of their send data writing to the same location in recvbuf. Keep in mind: the writing of the data into that buffer is handled by the MPI run-time system, not you. And unless you explicitly insert barriers or other forms of synchronization, the timings of those writes may differ widely from what you would expect from serial programming experience.


Type signatures

Note the above weasel words "sendtype and recvtype almost always match". When you use vector variants of the MPI functions, or for heterogenous data use MPI derived datatypes, or MPI Pack/Unpack functions, the idea of a type signature is vital. The data part of a message is specified by a sequence of pairs
{(T0, d0), (T1, d1), (T2, d2), ... (Tn-1, dn-1)}
where each Ti is an MPI datatype and di is a displacement into the message. The sequence of types alone,
{T0, T1, T2, ..., Tn-1}
is called the type signature of the message. The rule is that the type signatures specified by sender and receiver must be compatible. So if a send operation sends a message with signature
{T0, T1, T2, ..., Tn-1}
and the matching receive specifies a type signature
{U0, U1, U2, ..., Um-1}
then n must not be larger than m, and Ti = Ui for i = 0, 1, ..., m-1.

When you specify only a single datatype, as with MPI_Send, but a sendcount larger than 1, then MPI creates the type signature as the concatenation of sendcount copies of the MPI datatype specified.

The majority of scientific codes will naturally have type signatures that match on both sender and receiver. The main concern is for collective communications where the type signatures of all participating processes must be identical.


  • Class home page: B673