Notes on an Approach to Calculation Placement in HPF

Bryan Carpenter
University of Southampton


Some Definitions

A forall expression is a generalisation of an array expression, which need not be rectangular (and may include masks in its definition?---not yet fully investigated).

Any intermediate result in an HPF forall (or array operation) can be represented by a forall expression (by construction).

A forall expression is characterised by a set of indices. The range parameters of certain indices may depend on other indices. If the range parameters of a particular index don't depend on any other indices of the expression, call that index a Cartesian index.

Policies

Some fundamental policy decisions about mapping of forall expressions (equivalent to mapping of temporaries in forall statements; equivalent to placement of calculation):

  1. The mapping of a forall expression should be a legal HPF mapping.

    This means that only Cartesian indices can be distributed. Non-Cartesian indices are collapsed.

    E.g. a triangular forall expression used as a temporary in:

          FORALL (x = 1 : n)
            FORALL (y = 1 : x)
              a (x, y) = ...
    
    has a Cartesian index x and a non-Cartesian index y. x may be distributed, but y must be collapsed.

  2. Processor arrangements and distribution formats of forall expressions are always inherited from program variables.

    The mapping scheme has to make a plausible guess about which program variable to inherit these attributes from, and about an appropriate mapping (permutation + intra-dimensional alignment) of forall indices to the dimensions of the program variable in question.

    The mapping scheme does not attempt to make any decision about distribution formats or processor arrangments of program variables or expression temporaries. It is assumed that the programmer (or a mythical intelligent pre-processor tool, or something) has already made these decisions about the distributed arrays in the program---the mapping strategy just propagates the attributes from program variables to array (forall) expression temporaries.

    The mapping scheme also does not base any decision about the target program variable on the distribution format of that variable. The mapping scheme is blind to distribution format.

  3. In contrast, templates of forall expressions are generally synthesised by the mapping scheme.

    That is, a forall expression may be mapped to a template which is not the template of a program variable.

    This is because the shape of a forall expression may be different from the shape of any program variable template, and there may not be a natural embedding of the expression in the template of the program variable on which the distribution of the temporary is modelled.

    E.g.

          INTEGER a (n)
    
          FORALL (x = 1 : m)
            ... a (f (x)) ...
    
    the subexpression a (f (x)) may well inherit its distribution format from the program variable a, but there is no obvious mapping of the the elements of the subexpression to the template of a. Instead a new template with the same distribution format as a but extent m is likely to be synthesised.

    The template of a is then the parent template of the synthesised template.

  4. Before resorting to template synthesis, the mapping scheme will try to recycle the template of an existing variable.

          FORALL (x = 1 : m)
            ... a (2 * x) ...
    
    A highly favoured alignment of the subexpression a (2 * x) is to the existing template of a, with a stride of 2.

  5. Often a forall expression will have more indices than the rank of any plausible parent template.

    E.g. the forall statement has 3 forall indices, but no variable involved in the forall has rank higher than 2.

    In this case any forall indices which don't find a processor dimension to which they can map will be collapsed.

    This is regarded as an unfavourable outcome, because collapsing of intermediate arrays is equivalent to sequentalisation of computation. The mapping scheme should make a sensible attempt to find a large enough parent template before resorting to collapsing.

  6. Synthesis of mapping attributes from the children of a sub-expression is favoured over inheritance from the parent expression (but both strategies must be available).

A Model Language

Abstract syntax

  <expression> ::= arr <name_list> <expression>      |
                   idx <name>                        |
                   var <name> <expression_list>      |
                   fun <name> <expression_list>
An expression is an array expression which scopes (binds) a list of indices, an index reference expression---simply a reference to an index, a variable reference expression in which a named variable is is subscripted with a list of expressions, or a function reference expression, in which a function is applied to a list of expressions.

E.g.

  FORALL (x = 1 : n, y = 1 : n)
    a (x, y) = x + y
can be modelled by
  arr [x, y]
    fun = 
      var a
        idx x
        idx y
      fun +
        idx x
        idx y
[By embedding arr nodes inside expressions, this syntax can support functions with array arguments. That involves one or two complications in mapping. For illustration the toy scheme below assumes arr nodes are at the outermost level of expressions, as in a forall or array assignment involving only elemental operations].

Index Dependency

The first phase of mapping an expression in this language is index dependency analysis

For our purposes this just means decorating each subexpression with the set of indices on which it depends. [In full HPF it also involves deciding which expressions are linear in indices, and enumerating the linear representations.] This decoration can be achieved with a single post-order traversal. In Prolog:

  depen(arr(Is, E, Atr), Xs) :-
    depen(E, Ys),
    union(Ys, Is, Zs),       % Make sure all locally bound indices included
    atrISet(Atr, Zs),
    subtract(Zs, Is, Xs).    % Locally bound indices not passed up to parent
  depen(idx(X), [X]).
  depen(var(_, Es, Atr), Xs) :-
    depenList(Es, Xss),
    atrSubsISets(Atr, Xss),  % Save for future reference
    unionList(Xss, Xs),
    atrISet(Atr, Xs).
  depen(fun(_, Es, Atr), Xs) :-
    depenList(Es, Xss),
    unionList(Xss, Xs),
    atrISet(Atr, Xs).

  depenList([], []).
  depenList([E | Es], [Xs | Xss]) :-
    depen(E, Xs),
    depenList(Es, Xss).
[atrISet and similar predicates are structure selectors on the abstract data type representing node attributes. Various set operations (e.g. union, subtract, unionList, ...) will be made up as we go along, without explanation.]

Index Maps

Next we decorate the expression tree with index maps. An index map consists of a target template (actually a parent template in the terminology above), and a map from the index set of the node to the dimensions of the target template. Collapsed mappings are also allowed.

E.g. A node with index set [x, y] could have index map:

  map(a, 2, [x -> 2, y -> *])
a is a rank-two program variable (the second field is the rank of the target variable). Index x is mapped to the second dimension of [a template synthesised from the template of] a. Index y is collapsed. For simplicity, we will say that if a dimension of the template not targetted by any mapping (dimension 1 of the template, here) the expression is replicated in that dimension. [In full HPF, it would be necessary to consider embedding in such dimensions.]

In our model the main computation/communication focus is at function reference and variable reference nodes. There are two index maps associated with such nodes. Call these the source map and the destination map.

[This terminology doesn't always reflect the direction of data flow. The destination map is a map derived directly from the parent node, whereas the source map is locally constructed. For the left-hand side of an assignment (left argument of the function =), data will initially reside at locations defined by the destination map, and then be moved to locations defined by the source map...]

This phase also imposes horizontal constraints on index maps (hence the name of the predicates). A daughter which shares an index with a parent shares the destination-mapping of that index with the parent's source-mapping for the index. Thus siblings also share destination-mappings of common indices.

  constrain(arr(Is, E, Atr), T, R, P) :-  % T: variable, rank R.  P: mappings
    atrISet(Atr, Xs), subtract(Xs, Is, Ys),
				  % `Ys': idxs *not* bound by this arr node
    domain(L, Ys), subset(L, P),  % `L': map with domain `Ys', contained in `P'
    domain(M, Xs), subset(L, M),  % `M': map with domain `Xs', containing `L'
    atrNodMap(Atr, map(T, R, M)),
    constrain(E, T, R, M).
  constrain(idx(_), _, _, _).
  constrain(var(_, Es, Atr), T, R, P) :-
    atrISet(Atr, Xs),
    domain(L, Xs), subset(L, P),  % `L': map with domain `Xs', contained in `P'
    atrDstMap(Atr, map(T, R, L)),
    domain(M, Xs),
    atrSrcMap(Atr, map(U, E, M)), % `M': map with domain `Xs'
    constrainList(Es, U, E, M).
  constrain(fun(_, Es, Atr), T, R, P) :-
    atrISet(Atr, Xs),
    domain(L, Xs), subset(L, P),  % `L': map with domain `Xs', contained in `P'
    atrDstMap(Atr, map(T, R, L)),
    domain(M, Xs),
    atrSrcMap(Atr, map(U, E, M)), % `M': map with domain `Xs'
    constrainList(Es, U, E, M).

  constrainList([], _, _, _).
  constrainList([E | Es], T, R, P) :-
    constrain(E, T, R, P),
    constrainList(Es, T, R, P).

Mapping arguments passed in to the predicates are those of the parent node. After completion of this phase all targets remain uninstantiated, but the constraints mentioned above have been implemented by unification of logic variables.

Mapping Variable Reference Nodes

The instantiation of index map targets starts with variable reference nodes.

  varMap(arr(_, E, _))    :- varMap(E).
  varMap(idx(_)).
  varMap(var(V, Es, Atr)) :-
    % Attempt ``local reference mapping''...
    atrSubsISets(Atr, Xss),
    size(Es, R),          % Compute rank of `V' from number of subscripts
    atrSrcMap(Atr, map(V, R, M)),
    mapSingletSubscripts(Xss, M, 1),
    varMapList(Es).
  varMap(var(_, Es, _))   :-
    % ...or leave uninstantiated.
    varMapList(Es).
  varMap(fun(_, Es, _))   :-  varMapList(Es).

  varMapList([]).
  varMapList([E | Es]) :- varMap(E), varMapList(Es).

  mapSingletSubscripts([], _, _).
  mapSingletSubscripts([Xs | Xss], M, D) :-
    mapSingletSubscript(Xs, M, D),
    D1 is D + 1, mapSingletSubscripts(Xss, M, D1).

  mapSingletSubscript([X], M, D) :-
    % Attempt to map index `X' to dimension `D'...
    member(X -> D, M).
  mapSingletSubscript(_, _, _).
    % ...or leave mapping to `D' uninstantiated.

A local reference mapping of a variable reference node is one which takes the referenced variable as the template---a natural strategy.

The scheme doesn't require this strategy, because the expression node may have more indices than the rank of the variable, leading to undesirable, collapsing (sequentialisation).

E.g.

  FORALL (x = 1 : n, y = 1 : n)
    a (x, y) = v (x + y)
Rather than map the expression v (x + y) to the one-dimensional template v, collapsing one of the indices, it would be probably be better to map it directly to the two-dimensional template a.

So there is a second rule for mapping a variable, which leaves the target template uninstantiated at a node (deferring choice of template for the node to the subsequent propagation pass).

If a local reference mapping is selected, individual index to dimension mappings are selected on the basis of the index dependency of the subscript expression. If the D'th subscript was linear in index X, that would be a strong indication that X should be mapped to dimension D of the template. In the simplified exposition here, we haven't analysed linearity, but first claim to a particular dimension is given to indices appearing as the only index of subscript expressions in the relevant position.

E.g.

  FORALL (x = 1 : n, y = 1 : n)
    ... a (x, f (y)) ...
favoured (source) mapping of a (x, f (y)) will be
  map(a, 2, [x -> 1, y -> 2]).

E.g.

  FORALL (x = 1 : n, y = 1 : n, z = 1 : n)
    ... a (x, f (y, z)) ...
favoured mapping of a (x, f (y, z)) will be
  map(a, 2, [x -> 1, y -> _, z -> _]).
The `y' and `z' mappings are uninstantiated by this phase.

E.g.

  FORALL (x = 1 : n, y = 1 : n, z = 1 : n)
    ... a (x, x) ...
favoured mapping of a (x, x)) will be
  map(a, 2, [x -> 1])
first, then
  map(a, 2, [x -> 2]).

In the course of backtracking, both possibilities will be enumerated by:

  mapSingletSubscript([[x], [x]], [x -> D], 1).

Vertical Propagation

Vertical propagation of a map is propagation across the arc in the expression graph emerging from a function reference or variable reference. Propagation is implemented by unifying targets in the source and destination maps of these nodes.

  propagate(arr(_, E, _))    :- propagate(E).
  propagate(idx(_)).
  propagate(var(_, Es, Atr)) :-
    propagateList(Es),
    atrDstMap(Atr, MD),
    atrSrcMap(Atr, MS),
    propagateArc(MD, MS).
  propagate(fun(_, Es, Atr))   :-
    propagateList(Es),
    atrDstMap(Atr, MD),
    atrSrcMap(Atr, MS),
    propagateArc(MD, MS).

  propagateList([]).
  propagateList([E | Es]) :- propagate(E), propagateList(Es).

  propagateArc(MD, MD).  % propagate mapping...
  propagateArc(_, _).    % ...or don't
The adoption of a postorder traversal here favours synthesised over inheritted mappings.

Selection

The preceding phases generate many possible mappings [the art is to order the rules so that favourable mappings emerge early in the process of resolution]. These include many mappings where some map targets are uninstantiated. The last phase eliminates any candidates for which some nodes have uninstantiated target templates, eliminates any mappings where two different indices are mapped to the same dimension, and randomly instantiates any remaining uninstantiated dimension mappings. In particular it may collapse certain mappings. This is only allowed if the target is sufficiently large---we don't want to favour unnecessary sequentialisation. The argument CRank defines the smallest rank target template on which collapsing is permitted.

  select(arr(_, E, Atr), CRank) :-
    atrNodMap(Atr, map(T, R, M)),
    ground(T),
    mapAll(M, R, CRank),
    select(E, CRank).
  select(idx(_), _).
  select(var(_, Es, Atr), CRank) :-
    atrSrcMap(Atr, map(T, R, M)),
    ground(T),
    mapAll(M, R, CRank),
    selectList(Es, CRank).
  select(fun(_, Es, Atr), CRank) :-
    atrSrcMap(Atr, map(T, R, M)),
    ground(T),
    mapAll(M, R, CRank),
    selectList(Es, CRank).

  selectList([], _).
  selectList([E | Es], CRank) :- select(E, CRank), selectList(Es, CRank).

  mapAll(M, R, CRank) :-
    R < CRank, !,
    enum(R, Dims),        % enum(N, [1, ..., N]).
    inject(M, Dims).
  mapAll(M, R, _) :-
    enum(R, Dims),
    injectOrCollapse(M, Dims).

  inject([], _).
  inject([_ -> D | M], Range) :-
    member(D, Range),
    remove(D, Range, Range1),
    inject(M, Range1).

  injectOrCollapse([], _).
  injectOrCollapse([_ -> D | M], Range) :-
    member(D, Range),
    remove(D, Range, Range1),
    injectOrCollapse(M, Range1).
  injectOrCollapse([_ -> * | M], Range) :-
    injectOrCollapse(M, Range).
A plausible choice for CRank is the largest rank of any program variable appearing in the expression being mapped.

The CRank mechanism is not ideal, but more refined mechanisms look prohibitively expensive. Some device is needed to suppress over-eager collapsing of indices.

Example Mapping Results

E.g.

  %  FORALL (x = 1 : n, y = 1 : n)
  %    a (x, y) = x + y
  %  ENDFORALL

  arr x, y                           a:[x->1,y->2]
    fun =                            a:[x->1,y->2]            a:[x->1,y->2]
      var a                          a:[x->1,y->2]            a:[x->1,y->2]
	idx x
	idx y
      fun +                          a:[x->1,y->2]            a:[x->1,y->2]
	idx x
	idx y
Notes: mapping of LHS is `natural'. Terms on RHS inherit sensible mappings.

E.g.

  %  FORALL (x = 1 : n, y = 1 : n)
  %    v (x + n * (y - 1)) = x + y
  %  ENDFORALL
   

  arr x, y                           v:[x->1,y-> *]
    fun =                            v:[x->1,y-> *]           v:[x->1,y-> *]
      var v                          v:[x->1,y-> *]           v:[x->1,y-> *]
	fun +                        v:[x->1,y-> *]           v:[x->1,y-> *]
	  idx x
	  fun *                      v:[y-> *]                v:[y-> *]
	    var n                    n:[]                     v:[]
	    fun -                    v:[y-> *]                v:[y-> *]
	      idx y
	      var 1                  1:[]                     v:[]
      fun +                          v:[x->1,y-> *]           v:[x->1,y-> *]
	idx x
	idx y
Notes: y collapsed throughout expression.

E.g.

  %  FORALL (j = r1 : n)
  %    u (j) = a (r, j) / l (r)
  %  ENDFORALL
   

  arr j                              u:[j->1]
    fun =                            u:[j->1]                 u:[j->1]
      var u                          u:[j->1]                 u:[j->1]
	idx j
      fun /                          a:[j->2]                 u:[j->1]
	var a                        a:[j->2]                 a:[j->2]
	  var r                      r:[]                     a:[]
	  idx j
	var l                        l:[]                     a:[]
	  var r                      r:[]                     l:[]
Notes: arithmetic on RHS mapped to a. Not owner computes! Remapping after /.

Conclusion

The model presented here omits very many features of HPF---it doesn't handle index ranges, intra-dimensional alignment including `embedded' alignments, array arguments, array-valued functions, variable to variable alignment, dummy argument alignment, etc.

Work on generalising the approach to include all these features is well-advanced.

It is apparent that the generate and test approach to mapping adopted here is subject to a combinatorial explosion. So far we have been unable to find a plausible mapping scheme which didn't involve essential backtracking, and potentially exponential complexity. However, there are many ways to prune the search tree of the naive implementation given here, and achieve the same or essentially the same results. We are optimistic that a practical scheme will emerge.


Appendix

Complete Prolog from above notes.

?- op(700, yfx, ->).

% Index Dependency Analysis


depen(arr(Is, E, Atr), Xs) :-
  depen(E, Ys),
  union(Ys, Is, Zs),       % Make sure all locally bound indices included
  atrISet(Atr, Zs),
  subtract(Zs, Is, Xs).    % Locally bound indices not passed up to parent
depen(idx(X), [X]).
depen(var(_, Es, Atr), Xs) :-
  depenList(Es, Xss),
  atrSubsISets(Atr, Xss),  % Save for future reference
  unionList(Xss, Xs),
  atrISet(Atr, Xs).
depen(fun(_, Es, Atr), Xs) :-
  depenList(Es, Xss),
  unionList(Xss, Xs),
  atrISet(Atr, Xs).

depenList([], []).
depenList([E | Es], [Xs | Xss]) :-
  depen(E, Xs),
  depenList(Es, Xss).
 

% Horizontal Constraints


constrain(arr(Is, E, Atr), T, R, P) :-  % T: variable, rank R.  P: mappings
  atrISet(Atr, Xs), subtract(Xs, Is, Ys),
				% `Ys': idxs *not* bound by this arr node
  domain(L, Ys), subset(L, P),  % `L': map with domain `Ys', contained in `P'
  domain(M, Xs), subset(L, M),  % `M': map with domain `Xs', containing `L'
  atrNodMap(Atr, map(T, R, M)),
  constrain(E, T, R, M).
constrain(idx(_), _, _, _).
constrain(var(_, Es, Atr), T, R, P) :-
  atrISet(Atr, Xs),
  domain(L, Xs), subset(L, P),  % `L': map with domain `Xs', contained in `P'
  atrDstMap(Atr, map(T, R, L)),
  domain(M, Xs),
  atrSrcMap(Atr, map(U, E, M)), % `M': map with domain `Xs'
  constrainList(Es, U, E, M).
constrain(fun(_, Es, Atr), T, R, P) :-
  atrISet(Atr, Xs),
  domain(L, Xs), subset(L, P),  % `L': map with domain `Xs', contained in `P'
  atrDstMap(Atr, map(T, R, L)),
  domain(M, Xs),
  atrSrcMap(Atr, map(U, E, M)), % `M': map with domain `Xs'
  constrainList(Es, U, E, M).

constrainList([], _, _, _).
constrainList([E | Es], T, R, P) :-
  constrain(E, T, R, P),
  constrainList(Es, T, R, P).


% Variable Reference Mapping.


varMap(arr(_, E, _))    :- varMap(E).
varMap(idx(_)).
varMap(var(V, Es, Atr)) :-
  % Attempt ``local reference mapping''...
  atrSubsISets(Atr, Xss),
  size(Es, R),          % Compute rank of `V' from number of subscripts
  atrSrcMap(Atr, map(V, R, M)),
  mapSingletSubscripts(Xss, M, 1),
  varMapList(Es).
varMap(var(_, Es, _))   :-
  % ...or leave uninstantiated.
  varMapList(Es).
varMap(fun(_, Es, _))   :-  varMapList(Es).

varMapList([]).
varMapList([E | Es]) :- varMap(E), varMapList(Es).

mapSingletSubscripts([], _, _).
mapSingletSubscripts([Xs | Xss], M, D) :-
  mapSingletSubscript(Xs, M, D),
  D1 is D + 1, mapSingletSubscripts(Xss, M, D1).

mapSingletSubscript([X], M, D) :-
  % Attempt to map index `X' to dimension `D'...
  member(X -> D, M).
mapSingletSubscript(_, _, _).
  % ...or leave mapping to `D' uninstantiated.


% Propagation of Mappings.


propagate(arr(_, E, _))    :- propagate(E).
propagate(idx(_)).
propagate(var(_, Es, Atr)) :-
  propagateList(Es),
  atrDstMap(Atr, MD),
  atrSrcMap(Atr, MS),
  propagateArc(MD, MS).
propagate(fun(_, Es, Atr))   :-
  propagateList(Es),
  atrDstMap(Atr, MD),
  atrSrcMap(Atr, MS),
  propagateArc(MD, MS).

propagateList([]).
propagateList([E | Es]) :- propagate(E), propagateList(Es).

propagateArc(MD, MD).  % propagate mapping...
propagateArc(_, _).    % ...or don't


% Range Injection.


select(arr(_, E, Atr), CRank) :-
  atrNodMap(Atr, map(T, R, M)),
  ground(T),
  mapAll(M, R, CRank),
  select(E, CRank).
select(idx(_), _).
select(var(_, Es, Atr), CRank) :-
  atrSrcMap(Atr, map(T, R, M)),
  ground(T),
  mapAll(M, R, CRank),
  selectList(Es, CRank).
select(fun(_, Es, Atr), CRank) :-
  atrSrcMap(Atr, map(T, R, M)),
  ground(T),
  mapAll(M, R, CRank),
  selectList(Es, CRank).

selectList([], _).
selectList([E | Es], CRank) :- select(E, CRank), selectList(Es, CRank).

mapAll(M, R, CRank) :-
  R < CRank, !,
  enum(R, Dims),        % enum(N, [1, ..., N]).
  inject(M, Dims).
mapAll(M, R, _) :-
  enum(R, Dims),
  injectOrCollapse(M, Dims).

inject([], _).
inject([_ -> D | M], Range) :-
  member(D, Range),
  remove(D, Range, Range1),
  inject(M, Range1).

injectOrCollapse([], _).
injectOrCollapse([_ -> D | M], Range) :-
  member(D, Range),
  remove(D, Range, Range1),
  injectOrCollapse(M, Range1).
injectOrCollapse([_ -> * | M], Range) :-
  injectOrCollapse(M, Range).


%-----------------------------------------------------------------

% Miscellaneous ``set'' operations.

union([], X, X).
union([X | R], Y, Z) :- member(X, Y), !, union(R, Y, Z). 
union([X | R], Y, [X | Z]) :- union(R, Y, Z).

subtract(M, [], M).
subtract(M, [X | I], N) :- remove(X, M, P), subtract(P, I, N).

member(X, [X | _]).
member(X, [_ | Y]) :- member(X, Y).

remove(_, [], []).
remove(X, [X | L], L)       :- !.
remove(X, [Y | L], [Y | M]) :- remove(X, L, M).

subset([], _).
subset([X | M], N) :- member(X, N), !, subset(M, N).

size([], 0).
size([_ | R], N) :- ground(N), !, N > 0, N1 is N - 1, size(R, N1).
size([_ | R], N) :- size(R, N1), N is N1 + 1.

domain([], []).
domain([X -> _ | M], [X | Xs]) :- domain(M, Xs).

unionList([], []).
unionList([S | Ss], U) :- unionList(Ss, V), union(S, V, U).

enum(N, Is) :- enumFrom(1, N, Is).

enumFrom(I, N, [I | Is]) :- I =< N, !, I1 is I + 1, enumFrom(I1, N, Is).
enumFrom(_, _, []).

%-----------------------------------------------------------------

% Selector Predicates.

atrISet(atr(Xs, _, _, _), Xs).
atrSubsISets(atr(_, Xss, _, _), Xss).
atrNodMap(atr(_, _, M, _), M).
atrSrcMap(atr(_, _, M, _), M).
atrDstMap(atr(_, _, _, M), M).

%-----------------------------------------------------------------

% Run the whole mapping procedure

run(P, CRank) :-
    depen(P, _),
    constrain(P, _, _, _),
    varMap(P),
    propagate(P),
    select(P, CRank).

Author Bryan Carpenter (dbc@npac.syr.edu), 24 June 1996.
Converted to HTML by John Merlin, 31 October 1997.