PARALLEL DATA REPRESENTATION

Felipe Bertrand, Randall Bramley
Indiana University. June 2002.

INDEX

  1. INTRODUCTION
  2. PREVIOUS WORK
    1. PAWS DATA MODEL
    2. CUMULVS DATA MODEL
    3. META-CHAOS DATA MODEL
  3. CCA DATA GROUP PROPOSAL
  4. REFERENCES

1. INTRODUCTION

This write-up is a presentation of current efforts regarding the representation of parallel data structures in high-performance scientific applications. The first part is a review of previous work, that includes systems such as PAWS (Parallel Application Workspace, University of California and Los Alamos National Laboratory), CUMULVS (Collaborative User Migration User Library for Visualization and Steering, Oak Ridge National Laboratory) and Meta-Chaos (University of Maryland). The second part is a review of the ongoing effort by the CCA (Common Component Architecture) Data Group.

Representing data in a parallel program can be conceptually divided in two operations: defining the layout, or how the data is spread through the processes, and instantiating the actual data. The most common operations that are performed in each of these two steps are summarized in the following table:

Layout
  • Specification of global domain (dimensions, index space)
  • Specification of distribution pattern
  • Specification of logical process topology
Actual Data
  • Specification of the data type
  • Allocation of storage space
  • Placement of process in logical process topology
  • Link data object to layout (including alignment of data within layout)

In the following sections we will see how the different systems implement some or all of these operations.

2. PREVIOUS WORK

2.1 PAWS (Parallel Application Workspace) DATA MODEL

PAWS  is a software infrastructure to connect separate parallel applications within a component-like model. It allows the applications to share parallel data structures. The applications specify the parallel layout of the data to be shared with the PAWS API. The applications can have different number of processes, different layout strategies and be written in different programming languages (C, C++, F77). PAWS uses MPI as its message-passing communication library (and still allows the individual applications to use MPI for their internal communications).

Paws supports two types of data structures: single scalar quantities and parallel multidimensional arrays. For parallel multidimensional arrays, applications must first define the global domain, which sets the index ranges and the stride for each of the dimensions of the data structure. The global domain must be known to all the processes of the parallel application.

The global domain is then divided in sub-domains. Sub-domains must tile completely the global domain. Each sub-domain needs not to be known to all the processes of the application: only the process that will actually allocate space for it is required to register it with PAWS. Also, one process can own more than one sub-domain.

Global and sub-domains constitute the layout of the parallel data structure. In paws terminology, this is called the representation.

Paws attaches representations to ports. Ports are connection endpoints capable of sending or receiving one particular representation. You can send many data objects through the same port (even data objects of different types) as long as they share the same representation.

Paws allows the user to create "empty" ports with no representation associated. When the connection is established, the application can retrieve the representation information from the other endpoint, and set the representation of the local port as it sees fit.

Once the representation is defined, the processes can specify the actual data objects. The actual information that is going to be sent through the port is stored in a view. A view contains a link to the port (and therefore to the layout information) as well as pointers to the actual blocks of memory space (view blocks) where the data resides. Multiple pointers can be passed to the view if the data is allocated in separate data blocks. When creating a view block, the application can specify an allocated space that is larger than the actual domain that the data object covers. This can be used to accommodate space for ghost cells, for example.

paws parallel data representation

2.2 CUMULVS DATA MODEL

CUMULVS is a software infrastructure to facilitate the remote visualization and steering of parallel scientific applications. In addition to providing mechanism for sharing parallel data structures, CUMULVS also provides services for remote steering and user directed check-pointing. In this section we will be concerned only with the services for remote visualization of parallel data structure.

CUMULVS supports multidimensional distributed arrays. However, this data model applies only to the sender side (the parallel application that is being "monitored"). The receiving side has no distributed description of the data. From the point of view of CUMULVS, the receiving side (the visualizer) is a regular single-process application. Although several viewers can be attached to the sender (each possibly requesting a different chunk of the computational domain), there is not a truly MxN communication because the receiving cohort of processes do not form a parallel application.

Multidimensional distributed arrays in Paws could be distributed arbitrarily among the processes that made up the parallel program. In CUMULVS it is not possible to distribute the arrays in a fully arbitrary manner. However, it does allow the most common used distribution patterns.

Data decompositions in CUMULVS are defined in a per-dimension (or per-axis) basis. First, the program must specify the number of dimensions of the global array. Then, for each dimension (axis), the program must specify the lower and upper bound of the index and kind of decomposition (block, cyclic, explicit or collapsed). The kind of decompositions supported are HPF inspired:
CUMULVS data representation

To completely define the representation, the user must provide a logical topology for the processes. In Paws this information was not necessary, because the user explicitly allocated each data block. In CUMULVS, on the other hand, this information is required by the system in order to assign appropriately the data blocks to the processes.

Once the layout is defined, the application must allocate the actual data and link it to the decomposition. The CUMULVS term for the actual data is data field. As in Paws, several data fields can share the same decomposition, and the data field definition allows for ghost cells. In addition to providing a pointer to the allocated space, the process must tell CUMULVS its position in the logical process topology.

As we mentioned before, the CUMULVS receiver application is not a parallel program. The visualizer (of viewer), requests a single data field or a collection of data fields to the library (this is to accommodate systems in which many variables are computed at the same time; e.g. "velocity" and "temperature"). For each data field or collection of data fields, the viewer must specify the portion of the data field or computational space that is requested. Other "goodies" in CUMULVS are the posibility of sampling the original data field (e.g. request 1 of every 3 values), make an automatic data type conversion (e.g. double -> int) and change the storage order (row/column major).

2.3 META-CHAOS DATA MODEL

Meta-Chaos is a library developed that allows applications to obtain copies with different distribution layouts of parallel data objetcs. It can be used to convert between data representations needed by different libraries in the same parallel program, or it can be used to exchange data between different programs, even if thos programs have different number of processes and different data organizations.

To map between the source distribution and the target distribution, Meta-chaos requires both sides to describe their data distribution by means of regions. A region can be, for example, a regularly distributed array section (like a "block"). Regions can be combined, forming sets. The transfer of data is then between a source set and a target set.

To copy the elements from the source set into the positions of the target set, Meta-Chaos defines an operation called linearization. A linearization is just an operation that provides a total ordering for the elements (or, more precisely, for the positions of the elements) in a set of regions. In order words, it "flattens" the data structure into a one dimensional linear array. To copy the elements, the source linearization is copied into the target linearization. Then, the target linearization is undone and the elements are placed in the corresponding position of the target distribution. Note however that this explanation is only conceptual. For performance issues, the Meta-Chaos library does not store the result of the linearization in intermediate data structures as we have suggested.

The only requisite for transferring data from source to destination is for the two sets of regions to have the same number of elements. Then, the respective linearizations define where each source element is going to be copied. This process is show in the next figure:


Meta-Chaos Data Representation

The strength of the Meta-Chaos approach is that it is very flexible. The concept of linearization is independant from the original data structure and it is in no way limited to the multidimensial array structures that we have presented here. Linearizations for more exotic data structures (like a tree data structure) can also be provided. Also, more flexibility can be achieved by providing more than one linearization for a particular data structure (e.g. depth-first and breadth-first linearizations for tree structures) . A disadvantage of this scheme, however, is that linearizations must be provided by the original library that implement the data structures.

3. CCA DATA GROUP PROPOSAL

The Common Component Architecture (CCA) is a standard component architecture for high performance computing. It defines a set of interfaces that a high performance component framework must provide to the components. The objective is to build frameworks that will allow interoperability between scientific components developed by different teams and institutions. A fundamental difference between CCA and the libraries that have been presented previously (Paws, CUMULVS and Meta-Chaos), is that the CCA proposal is concerned only with the definition of the abstract interfaces. It is not an implementation.

Concerning parallel data structures, the CCA proposal supports distributed multidimensional, rectangular, arrays. Following the same scheme as before, we have a mechanism to describe the layouts (templates, following CCA nomenclature), and a mechanism to link the actual data (descriptors) to the layout.

Templates store the number of dimensions of the distributed array, as well as the lower and upper bounds of each of the dimensions. As we saw in CUMULVS, a process topology is required when specifying the layout. The distribution types that CCA understands are similar to those in HPF and CUMULVS. These are the following:

1) Collapsed: all the dimension is held in the same process.

2) Block: this includes regular block distribution (one block per process) and cyclic block distributions (when the block size is smaller than the dimension divided by the number of processes).

3) GenBlock: this distribution allows for blocks or arbitrary sizes on each process, although it is limited to one block per process.

CCA data representation (block)

4) Implicit: this distribution is an arbitrary mapping of elements to processes (on a per-axis basis).

CCA data representation (implicit)


5) Explicit: this distribution is completely user specified. Explicit representation cannot be combined with all the other types of representations (for example, cyclic or block). To define a representation, the application must specify a set of square regions that completely cover the domain and do not overlap. Each region is defined specified in one (and only one) process (generally the process that will own the actual data for that region, but this is not required). Examples of explicit templates that cannot be obtained with any of the other types follow:

CCA data representation (explicit)


Once the template is specified, the application can build data objects. When creating the data object, the application must specify the template to which it is associated, the data type, each process position in the process topology, and the alignment of the data object within the template.

The template and the actual data object do not need to have the same size, not even the same number of dimensions. If the number of dimensions of the data object is greater than the number of dimensions of the template, the extra dimensions are assumed to be not distributed (collapsed).  If the data object has fewer dimensions than the template,  the result will depend on the details of the alignment mapping.

Setting the alignment means specifying how the actual data object relates to the reference template. There are several kinds of alignments:

1) Identity: This alignment maps each element position in the data object to the same position in the template:

data[i,j] maps to template[i,j]

2) Offset: The data object is alignment with the template with a simple offset:

data[i,j] maps to template[offset[0]+i,offset[1]+j]

3) General: This will be modeled following the HPF alignment capabilities. The details of this alignment have not yet been worked out by the CCA Data Group.

REFERENCES