next up previous


Notes from the Data Group Meeting held in SLC July 28th 2004

Randall Bramley

1. Process Topologies

Need a Cartesian grid at least (and that might be all we need), to give the number of processors assigned to each dimension. This is for some algorithms that require a mapping from the linear 0, 1, ... p-1 processor ID's to a 2D or 3D ordered tuples layout. This is needed mainly for "block" distribution. It is not necessary to specify toroidal versus rectangular, since this is only for process ID's, but can be added later if necessary.

For other distributions, only a 1D layout seems to make sense. So those will also get a "process layout" but it will be a 1-D one mainly to communicate the number of processes involved. These other distributions only need it when they are involved in some interoperable mapping with another process layout.

Below we do *not* deal with ghost nodes (halos). They will possibly be added later to the Descriptor (which had a name change to DATemplate).

2. Distributed Array Descriptor Interfaces

Wael is keeping the actual SIDL code as we develop it. This has some of the items discussed and rationale but most of that material is here, to be later inserted as code comments. We will want to use Docbook for this, and the comments need to have Docbook keywords:

The DAD is not intended to be a general data model. Instead we are defining the minimal interface needed to support the basics needed for parallel computing: finding what process owns a particular array datum and finding the overall layout of processes and array data on another (possibly remote) component.

2.1 ProcessLayout

rank
is the number of dimensions [i.e., the Fortran sense of "rank"] Since SIDL arrays have a length associated with them, we can just use it for the rank and not store it separately. It maybe that in general we want to allow arrays larger than the actually used space, e.g. allocate for a 3D process layout but only use a 2D one. However, we decided to not have potentially two places where rank is stored and could be contradictory. Each extent must be 1 or larger.

ProcID
is the 0 to p-1 identifier. Should this be collective or not? We allow this to be set by users since they may choose to have it different from what MPI provides as a process ID. Need not be a collective call, we can use a value of -1 if it has not been set. We'll throw an exception if the layout is not consistent.

ProcCoord
is the tuple that gives the caller's Cartesian coordinates within the grid. Should be 0-based.

extents
possibly could be found by the application, because once everyone has called setProcCoord(), they can then determine the extents. However, we are providing extents as a separate array in case we don't require all of the processes to invoke setProcCoord().

getLayoutRank()
is the dimensionality of the process layout, the product of extents

getNumProc()
is the total number of processes in the process grid (p)

getMyProcId()
is a user-defined integer for algorithmic purposes. It allows giving a mapping to an external process ID space such as MPI ranks or PVM task IDs.

getMyProcCoord()
is meant to return the array that gives the tuple coordinate of the process. It has an "in" array because of SIDL restrictions to allow correct deallocation of arrays, but logically the array is an "out" argument.

2.1.1 General Notes

Probably will make this completely optional, a user need not implement the ProcessLayout. In almost all cases we will need it and so an exception would be thrown if it is not created, but for tutorials and beginning programming with DADF we want to make as much as possible non-required.

Advice to implementors: A default ProcessLayout should be provided.

2.2 ProcessLayout1d

2.2.1 General notes

This is to separate out the 1D layout info (0 to p-1 identifier) from the Cartesian. Primary purpose is to guarantee that the Distribution will get back a 1D layout and not potentially a Cartesian tuple based one. This subclasses ProcessLayout.

2.3 Distribution

2.3.1 General notes

Base for a small logical hierarchy:

                _______Distribution__________
               /         \        \          \
              /           \        \          \
             /             \        \          \
            /               \        \          \
           /                 \        \          \
       AxisDistribution   Irregular  Patch    GenBlock
        /       \       
       /         \      
Collapsed      SeparableGenBlock  
            \
         BlockCyclic
               \
              Block

The CartesianDist will have an array of the subdistributions collapsed, block. separable block with each axis assigned one of the three, allowing mixing as in HPF. The hierarchical structure shown for those three is based on using specialization for some inquiries.

AxisDistribution is a convenience interface for the per-axis distributions.

CollapsedDist: a nondistributed assignment, all elements belong to one process

BlockCyclicDist: a regular distribution. The number of elements is (almost) equal for each block in the given coordinate direction.

Handling of excess elements when the number of processes in a coordinate direction does not evenly divide the number of elements is specified using the enum BlockRemainder. In addition to putting the remainder into one partition versus spreading them among the processes so that all have a number of elements within one of each other, it is also possible to do assign the special blocks in different ways. E.g., the remainder could go into the first, last, or "middle" partition. The Intercomm multiparti document has an extensive collection of the options it supports. Only three are here as placeholders.

Deferred discussion: in Scalapack you can specify where the block assignments start, rather than assuming they start at process 0. [The is probably done to help in some crude load balancing in cases like Gaussian elimination where the block column sizes decrease as the algorithm proceeds.] The discussion is whether to add in an argument to the initialization that specifies the start process number. Another option is to require a user wanting to do this could use the separable block interface, and give block sizes of 0 to the processes that need to be skipped. This entails some additional cost in storage and work, but not as much as going to a full-blown irregular distribution.

Advice to implementor: probably want to distinguish between condition when a user has created a BlockDist but not yet invoked initialize(), and the condition where a user has called initialize but does not yet know the number of elements and hence cannot yet figure out the block size.

Note: we originally put the getNumBlocks and getBlocksizes methods into the distribution, but it really belongs in the template. Reason: it requires both a process layout and the number of elements in that axis. The first is known in the distribution object, but the second is not known until we have the actual array and its size(s).

BlockDist: similar to BlockCyclicDist, but does not wrap around. Instead the size of each block is determined from the number of elements in the array and the number of processes in that coordinate direction.

BlockDist picture in 2D array of processes case:

|-------------------------------|
|       |       |       |       |
|       |       |       |       |
|       |       |       |       |
|-------------------------------|
|       |       |       |       |
|       |       |       |       |
|       |       |       |       |
|-------------------------------|
|       |       |       |       |
|       |       |       |       |
|       |       |       |       |
|-------------------------------|
SeparableGenBlockDist: this is a separable distribution, in that each coordinate can be specified independently but the number of elements in each coordinate direction need not be (almost) equal.

The distribution in each coordinate direction is specified as a list of the numbers of elements assigned to each process. A user can generate a list of start indices from this easily. Using a list of block sizes makes the specification independent of issues such as 0-base or 1-based indexing. The number of elements used for a block count can be zero (see next note).

Note that the number b of blocks in a axis need not equal the number of processes p, or the number of processes along that axis in a Cartesian process layout. If b > p, then a cyclic assignment is made. If b < p, then only the first b processes are assigned elements. If you wish to have processes other than the first b to be assigned elements, intersperse block sizes of 0.

SeparableGenBlockDist picture in 2D array of processes case:

|-------------------------------|
|     |         |  |            |
|-------------------------------|
|     |         |  |            |
|-------------------------------|
|     |         |  |            |
|     |         |  |            |
|     |         |  |            |
|-------------------------------|
|     |         |  |            |
|     |         |  |            |
|     |         |  |            |
|-------------------------------|

The other distributions are non-separable; you cannot specify them coordinate by coordinate. GenBlockDist: a nonseparable distribution. Implemented using a tree data structure in Intercomm, where the number of levels in the tree is equal to the dimension d of the process layout. We are not sure how to specify the initializer here, because the distribution is algorithmically defined. Using a d-level tree is possibly the most succint, but that is not a base object in SIDL.

GenBlockDist in the 2D case:

|-------------------------------|
|       |       |       |       |
|       |       |       |       |
|       |       |       |       |
|-------------------------------|
|   |   |   |   |    |          |
|   |   |   |   |    |          |
|   |   |   |   |    |          |
|-------------------------------|
|         |       |             |
|         |       |             |
|         |       |             |
|-------------------------------|

IrregularDist: this is the most general possible distribution; every element is assigned to a process. This requires an integer array of length equal to the number of elements and typically is inefficient if one of the more regular distributions can instead be used. However, the only applications we can find which use this (graph algorithms, sparse matrices) only involve 1D data arrays. If anyone has a need for more than 1D arrays to be distributed in an irregular distribution like this, we can extend it later by adding other initializers that take "in array<int, 2>" or higher. For now, we have inserted a deferred intialization.

PatchDist: this is intended to cover almost completely general blocks that tile the element space. Currently this is done algorithmically by calls to insertPatch, the initializer is deferred for now.

PatchDist picture in 2D array of processes case:

|-------------------------------|
|       |       |       |       |
|       |-------|       |       |
|       |       |       |       |
|--------       ----------------|
|   |   |       |       |       |
|   |   |       |       |       |
|   |   |       |       |       |
|----------------       --------|
|               |       |       |
|               |       |       |
|               |       |       |
|-------------------------------|

2.4 DAtemplate

The template contains name, rank, and arrays of dimension = rank which give the lower and upper bounds of the indices along each dimension. The lower and upper bounds are in global indices, not local indices. The bounds are inclusive, so a lower bound of 0 and upper bound of 10 will specify eleven elements.

A template can be used for multiple different distributed arrays, since it contains only index space information.

2.5 DistArray

Will contain a template (and only one), and a set of local SIDL arrays for holding the data assigned to the owning process. Issue about this; we need to have a SIDL array of SIDL arrays, where the local arrays will be of the same rank but may have different sizes and types. E.g., need array< array<_,_>, 1 >

Tom looks really tired. We are temporarily going to do this with a single local array, which prevents the PatchDist and some algorithms we want even with regular distributions.

About this document ...

Notes from the Data Group Meeting held in SLC July 28th 2004

This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)

Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split 0 -show_section_numbers DAD-notes-28Jul04.ltx.

The translation was initiated by Felipe Bertrand on 2004-07-29


next up previous
Felipe Bertrand
2004-07-29