John Merlin
VCPC, and
Scott Baden,
University of California at San Diego.
Some important non-uniform applications entail taking finite differences on an irregular collection of uniform meshes. Examples include adaptive mesh refinement and multiblock problems.
Such applications cannot readily be implemented purely in High Performance Fortran (HPF), as HPF is designed primarily for data parallelism over regular rectangular regions. However, since the components of an irregular data domain may themselves be regular, it is appropriate to apply HPF to the individual meshes, and to use an SPMD program written in another language to coordinate the HPF programs and to perform the required inter-block data movements, i.e. to `glue' the pieces together.
We have implemented such a `SPMD-HPF' programming system using two public domain software tools:
KeLP handles the array allocation and decomposition for an irregular set of blocks, and the data motion between them. It invokes HPF procedures on each block as a separate parallel computation.
The main task involved in interfacing KeLP to HPF is to arrange for KeLP to pass arguments to HPF procedures in the required form. The details of how arguments are passed in HPF depend on which HPF compiler is used. With the SHPF compiler, each array argument is accompanied by an extra argument (not visible in the original HPF code) called a mapping descriptor, which contains complete information about the array's mapping (i.e. its distribution). This convention is used by most HPF compilers, although the details of how information is stored in the descriptor is compiler-dependent.
To accommodate this, new KeLP classes were derived from the existing KeLP classes to store the necessary extra information, e.g. the mapping descriptors (in the format required by SHPF). In addition, some macros were defined to assist in calling HPF routines from KeLP.
For example, the HPF subroutine:
SUBROUTINE update (a, b)
REAL a (100), b
...
END
is called from KeLP as follows:
HPF_SUBR(update) (HPF_ARRAY_ARG(array), & b);
The macro HPF_SUBR changes the subroutine name
update in accordance with SHPF's `name-mangling' convention,
and the macro HPF_ARRAY expands the KeLP object
array into two comma-separated arguments - a data array
and its mapping descriptor - as expected by SHPF-compiled HPF procedures.
The address operator & is applied to scalar arguments such
as b, since in Fortran arguments are passed by address
rather than by value.
The same KeLP-HPF application interface should work with most HPF compilers. It is simply a matter of redefining the KeLP classes and macros suitably for the given HPF compiler.
The integration also involved enhancing SHPF's run-time system
to support general MPI communicators, so that HPF code may be run
within a dynamically defined subset of the processors assigned to the run.
As well as allowing the concurrent execution of different HPF procedures,
which is necessary for the KeLP-HPF implementation, this will also enable
task parallelism within HPF, and is necessary to
fully support some HPF features, such as PURE procedures
in FORALL, INDEPENDENT loops and the tasking
construct (an HPF-2 `Approved Extension').
To demonstrate the system we have written a simple KeLP-HPF program to solve Poisson's Equation by red-black relaxation on a domain consisting of a union of rectangular blocks of various sizes. The number of blocks, their sizes and positions, and their allocation to processors are input at runtime. The red-black relaxation within each block is coded in HPF, and KeLP is used to exchange data between blocks and to invoke HPF concurrently on the different blocks. This program may be downloaded here (gzipped tarfile, 4.7 Kb).
For example, the following input data to the red-black program:
5 0 6 0 11 1 2 0 5 16 0 6 2 1 2 0 11 10 16 2 1 4 0 6 15 21 1 1 6 0 16 20 26 3 1 7sets-up a domain consisting of 5 rectangular blocks as shown on the right, and distributes them (from bottom to top) over 2, 2, 2, 1 and 3 processors respectively (which means that each processor receives the same number of grid points, i.e. a 5 x 5 subgrid, so the computation is load-balanced).
Please address comments and questions about this work to John Merlin, jhm@vcpc.univie.ac.at.