A forall expression is a generalisation of an array expression, which need not be rectangular (and may include masks in its definition?---not yet fully investigated).
Any intermediate result in an HPF forall (or array operation) can be represented by a forall expression (by construction).
A forall expression is characterised by a set of indices. The range parameters of certain indices may depend on other indices. If the range parameters of a particular index don't depend on any other indices of the expression, call that index a Cartesian index.
Some fundamental policy decisions about mapping of forall expressions (equivalent to mapping of temporaries in forall statements; equivalent to placement of calculation):
This means that only Cartesian indices can be distributed. Non-Cartesian indices are collapsed.
E.g. a triangular forall expression used as a temporary in:
FORALL (x = 1 : n)
FORALL (y = 1 : x)
a (x, y) = ...
has a Cartesian index x and a non-Cartesian index
y. x may be distributed, but y
must be collapsed.
The mapping scheme has to make a plausible guess about which program variable to inherit these attributes from, and about an appropriate mapping (permutation + intra-dimensional alignment) of forall indices to the dimensions of the program variable in question.
The mapping scheme does not attempt to make any decision about distribution formats or processor arrangments of program variables or expression temporaries. It is assumed that the programmer (or a mythical intelligent pre-processor tool, or something) has already made these decisions about the distributed arrays in the program---the mapping strategy just propagates the attributes from program variables to array (forall) expression temporaries.
The mapping scheme also does not base any decision about the target program variable on the distribution format of that variable. The mapping scheme is blind to distribution format.
That is, a forall expression may be mapped to a template which is not the template of a program variable.
This is because the shape of a forall expression may be different from the shape of any program variable template, and there may not be a natural embedding of the expression in the template of the program variable on which the distribution of the temporary is modelled.
E.g.
INTEGER a (n)
FORALL (x = 1 : m)
... a (f (x)) ...
the subexpression a (f (x)) may well inherit its
distribution format from the program variable a, but there
is no obvious mapping of the the elements of the subexpression to the
template of a. Instead a new template with the same
distribution format as a but extent m is
likely to be synthesised.
The template of a is then the parent template
of the synthesised template.
Before resorting to template synthesis, the mapping scheme will try to recycle the template of an existing variable.
FORALL (x = 1 : m)
... a (2 * x) ...
A highly favoured alignment of the subexpression a (2 * x)
is to the existing template of a, with a stride of 2.
Often a forall expression will have more indices than the rank of any plausible parent template.
E.g. the forall statement has 3 forall indices, but no variable involved in the forall has rank higher than 2.
In this case any forall indices which don't find a processor dimension to which they can map will be collapsed.
This is regarded as an unfavourable outcome, because collapsing of intermediate arrays is equivalent to sequentalisation of computation. The mapping scheme should make a sensible attempt to find a large enough parent template before resorting to collapsing.
Synthesis of mapping attributes from the children of a sub-expression is favoured over inheritance from the parent expression (but both strategies must be available).
Abstract syntax
<expression> ::= arr <name_list> <expression> |
idx <name> |
var <name> <expression_list> |
fun <name> <expression_list>
An expression is an array expression which scopes (binds)
a list of indices, an index reference expression---simply
a reference to an index, a variable reference expression in
which a named variable is is subscripted with a list of expressions,
or a function reference expression, in which a function
is applied to a list of expressions.
E.g.
FORALL (x = 1 : n, y = 1 : n)
a (x, y) = x + y
can be modelled by
arr [x, y]
fun =
var a
idx x
idx y
fun +
idx x
idx y
[By embedding arr nodes inside expressions, this syntax
can support functions with array arguments. That involves one or two
complications in mapping. For illustration the toy scheme below assumes
arr nodes are at the outermost level of expressions, as in
a forall or array assignment involving only elemental operations].
The first phase of mapping an expression in this language is index dependency analysis
For our purposes this just means decorating each subexpression with the set of indices on which it depends. [In full HPF it also involves deciding which expressions are linear in indices, and enumerating the linear representations.] This decoration can be achieved with a single post-order traversal. In Prolog:
depen(arr(Is, E, Atr), Xs) :-
depen(E, Ys),
union(Ys, Is, Zs), % Make sure all locally bound indices included
atrISet(Atr, Zs),
subtract(Zs, Is, Xs). % Locally bound indices not passed up to parent
depen(idx(X), [X]).
depen(var(_, Es, Atr), Xs) :-
depenList(Es, Xss),
atrSubsISets(Atr, Xss), % Save for future reference
unionList(Xss, Xs),
atrISet(Atr, Xs).
depen(fun(_, Es, Atr), Xs) :-
depenList(Es, Xss),
unionList(Xss, Xs),
atrISet(Atr, Xs).
depenList([], []).
depenList([E | Es], [Xs | Xss]) :-
depen(E, Xs),
depenList(Es, Xss).
[atrISet and similar predicates are structure selectors on the
abstract data type representing node attributes. Various set
operations (e.g. union, subtract, unionList,
...) will be made up as we go along, without explanation.]
Next we decorate the expression tree with index maps. An index map consists of a target template (actually a parent template in the terminology above), and a map from the index set of the node to the dimensions of the target template. Collapsed mappings are also allowed.
E.g. A node with index set [x, y] could have index map:
map(a, 2, [x -> 2, y -> *])
a is a rank-two program variable (the second field is the rank
of the target variable). Index x is mapped to the second
dimension of [a template synthesised from the template of]
a. Index y is collapsed. For simplicity,
we will say that if a
dimension of the template not targetted by any mapping (dimension 1 of
the template, here) the expression is replicated in that dimension.
[In full HPF, it would be necessary to consider embedding in such
dimensions.]
In our model the main computation/communication focus is at function reference and variable reference nodes. There are two index maps associated with such nodes. Call these the source map and the destination map.
[This terminology doesn't always reflect the direction of data flow. The destination map is a map derived directly from the parent node, whereas the source map is locally constructed. For the left-hand side of an assignment (left argument of the function =), data will initially reside at locations defined by the destination map, and then be moved to locations defined by the source map...]
This phase also imposes horizontal constraints on index maps (hence the name of the predicates). A daughter which shares an index with a parent shares the destination-mapping of that index with the parent's source-mapping for the index. Thus siblings also share destination-mappings of common indices.
constrain(arr(Is, E, Atr), T, R, P) :- % T: variable, rank R. P: mappings
atrISet(Atr, Xs), subtract(Xs, Is, Ys),
% `Ys': idxs *not* bound by this arr node
domain(L, Ys), subset(L, P), % `L': map with domain `Ys', contained in `P'
domain(M, Xs), subset(L, M), % `M': map with domain `Xs', containing `L'
atrNodMap(Atr, map(T, R, M)),
constrain(E, T, R, M).
constrain(idx(_), _, _, _).
constrain(var(_, Es, Atr), T, R, P) :-
atrISet(Atr, Xs),
domain(L, Xs), subset(L, P), % `L': map with domain `Xs', contained in `P'
atrDstMap(Atr, map(T, R, L)),
domain(M, Xs),
atrSrcMap(Atr, map(U, E, M)), % `M': map with domain `Xs'
constrainList(Es, U, E, M).
constrain(fun(_, Es, Atr), T, R, P) :-
atrISet(Atr, Xs),
domain(L, Xs), subset(L, P), % `L': map with domain `Xs', contained in `P'
atrDstMap(Atr, map(T, R, L)),
domain(M, Xs),
atrSrcMap(Atr, map(U, E, M)), % `M': map with domain `Xs'
constrainList(Es, U, E, M).
constrainList([], _, _, _).
constrainList([E | Es], T, R, P) :-
constrain(E, T, R, P),
constrainList(Es, T, R, P).
Mapping arguments passed in to the predicates are those of the parent node. After completion of this phase all targets remain uninstantiated, but the constraints mentioned above have been implemented by unification of logic variables.
The instantiation of index map targets starts with variable reference nodes.
varMap(arr(_, E, _)) :- varMap(E).
varMap(idx(_)).
varMap(var(V, Es, Atr)) :-
% Attempt ``local reference mapping''...
atrSubsISets(Atr, Xss),
size(Es, R), % Compute rank of `V' from number of subscripts
atrSrcMap(Atr, map(V, R, M)),
mapSingletSubscripts(Xss, M, 1),
varMapList(Es).
varMap(var(_, Es, _)) :-
% ...or leave uninstantiated.
varMapList(Es).
varMap(fun(_, Es, _)) :- varMapList(Es).
varMapList([]).
varMapList([E | Es]) :- varMap(E), varMapList(Es).
mapSingletSubscripts([], _, _).
mapSingletSubscripts([Xs | Xss], M, D) :-
mapSingletSubscript(Xs, M, D),
D1 is D + 1, mapSingletSubscripts(Xss, M, D1).
mapSingletSubscript([X], M, D) :-
% Attempt to map index `X' to dimension `D'...
member(X -> D, M).
mapSingletSubscript(_, _, _).
% ...or leave mapping to `D' uninstantiated.
A local reference mapping of a variable reference node is one which takes the referenced variable as the template---a natural strategy.
The scheme doesn't require this strategy, because the expression node may have more indices than the rank of the variable, leading to undesirable, collapsing (sequentialisation).
E.g.
FORALL (x = 1 : n, y = 1 : n)
a (x, y) = v (x + y)
Rather than map the expression v (x + y) to the one-dimensional
template v, collapsing one of the indices, it would be probably
be better to map it directly to the two-dimensional template a.
So there is a second rule for mapping a variable, which leaves the target template uninstantiated at a node (deferring choice of template for the node to the subsequent propagation pass).
If a local reference mapping is selected, individual index to dimension mappings are selected on the basis of the index dependency of the subscript expression. If the D'th subscript was linear in index X, that would be a strong indication that X should be mapped to dimension D of the template. In the simplified exposition here, we haven't analysed linearity, but first claim to a particular dimension is given to indices appearing as the only index of subscript expressions in the relevant position.
E.g.
FORALL (x = 1 : n, y = 1 : n)
... a (x, f (y)) ...
favoured (source) mapping of a (x, f (y)) will be
map(a, 2, [x -> 1, y -> 2]).
E.g.
FORALL (x = 1 : n, y = 1 : n, z = 1 : n)
... a (x, f (y, z)) ...
favoured mapping of a (x, f (y, z)) will be
map(a, 2, [x -> 1, y -> _, z -> _]).The `y' and `z' mappings are uninstantiated by this phase.
E.g.
FORALL (x = 1 : n, y = 1 : n, z = 1 : n)
... a (x, x) ...
favoured mapping of a (x, x)) will be
map(a, 2, [x -> 1])first, then
map(a, 2, [x -> 2]).
In the course of backtracking, both possibilities will be enumerated by:
mapSingletSubscript([[x], [x]], [x -> D], 1).
Vertical propagation of a map is propagation across the arc in the expression graph emerging from a function reference or variable reference. Propagation is implemented by unifying targets in the source and destination maps of these nodes.
propagate(arr(_, E, _)) :- propagate(E).
propagate(idx(_)).
propagate(var(_, Es, Atr)) :-
propagateList(Es),
atrDstMap(Atr, MD),
atrSrcMap(Atr, MS),
propagateArc(MD, MS).
propagate(fun(_, Es, Atr)) :-
propagateList(Es),
atrDstMap(Atr, MD),
atrSrcMap(Atr, MS),
propagateArc(MD, MS).
propagateList([]).
propagateList([E | Es]) :- propagate(E), propagateList(Es).
propagateArc(MD, MD). % propagate mapping...
propagateArc(_, _). % ...or don't
The adoption of a postorder traversal here favours synthesised
over inheritted mappings.
The preceding phases generate many possible mappings [the art is
to order the rules so that favourable mappings emerge early in the
process of resolution]. These include many mappings where some map
targets are uninstantiated. The last phase eliminates any candidates
for which some nodes have uninstantiated target templates, eliminates
any mappings where two different indices are mapped to the same
dimension, and randomly instantiates any remaining uninstantiated
dimension mappings. In particular it may collapse certain mappings.
This is only allowed if the target is sufficiently large---we don't
want to favour unnecessary sequentialisation. The argument CRank
defines the smallest rank target template on which collapsing is
permitted.
select(arr(_, E, Atr), CRank) :-
atrNodMap(Atr, map(T, R, M)),
ground(T),
mapAll(M, R, CRank),
select(E, CRank).
select(idx(_), _).
select(var(_, Es, Atr), CRank) :-
atrSrcMap(Atr, map(T, R, M)),
ground(T),
mapAll(M, R, CRank),
selectList(Es, CRank).
select(fun(_, Es, Atr), CRank) :-
atrSrcMap(Atr, map(T, R, M)),
ground(T),
mapAll(M, R, CRank),
selectList(Es, CRank).
selectList([], _).
selectList([E | Es], CRank) :- select(E, CRank), selectList(Es, CRank).
mapAll(M, R, CRank) :-
R < CRank, !,
enum(R, Dims), % enum(N, [1, ..., N]).
inject(M, Dims).
mapAll(M, R, _) :-
enum(R, Dims),
injectOrCollapse(M, Dims).
inject([], _).
inject([_ -> D | M], Range) :-
member(D, Range),
remove(D, Range, Range1),
inject(M, Range1).
injectOrCollapse([], _).
injectOrCollapse([_ -> D | M], Range) :-
member(D, Range),
remove(D, Range, Range1),
injectOrCollapse(M, Range1).
injectOrCollapse([_ -> * | M], Range) :-
injectOrCollapse(M, Range).
A plausible choice for CRank is the largest rank of any program
variable appearing in the expression being mapped.
The CRank mechanism is not ideal, but more refined mechanisms look
prohibitively expensive. Some device is needed to suppress
over-eager collapsing of indices.
E.g.
% FORALL (x = 1 : n, y = 1 : n)
% a (x, y) = x + y
% ENDFORALL
arr x, y a:[x->1,y->2]
fun = a:[x->1,y->2] a:[x->1,y->2]
var a a:[x->1,y->2] a:[x->1,y->2]
idx x
idx y
fun + a:[x->1,y->2] a:[x->1,y->2]
idx x
idx y
Notes: mapping of LHS is `natural'. Terms on RHS inherit sensible
mappings.
E.g.
% FORALL (x = 1 : n, y = 1 : n)
% v (x + n * (y - 1)) = x + y
% ENDFORALL
arr x, y v:[x->1,y-> *]
fun = v:[x->1,y-> *] v:[x->1,y-> *]
var v v:[x->1,y-> *] v:[x->1,y-> *]
fun + v:[x->1,y-> *] v:[x->1,y-> *]
idx x
fun * v:[y-> *] v:[y-> *]
var n n:[] v:[]
fun - v:[y-> *] v:[y-> *]
idx y
var 1 1:[] v:[]
fun + v:[x->1,y-> *] v:[x->1,y-> *]
idx x
idx y
Notes: y collapsed throughout expression.
E.g.
% FORALL (j = r1 : n)
% u (j) = a (r, j) / l (r)
% ENDFORALL
arr j u:[j->1]
fun = u:[j->1] u:[j->1]
var u u:[j->1] u:[j->1]
idx j
fun / a:[j->2] u:[j->1]
var a a:[j->2] a:[j->2]
var r r:[] a:[]
idx j
var l l:[] a:[]
var r r:[] l:[]
Notes: arithmetic on RHS mapped to a. Not
owner computes! Remapping after /.
The model presented here omits very many features of HPF---it doesn't handle index ranges, intra-dimensional alignment including `embedded' alignments, array arguments, array-valued functions, variable to variable alignment, dummy argument alignment, etc.
Work on generalising the approach to include all these features is well-advanced.
It is apparent that the generate and test approach to mapping adopted here is subject to a combinatorial explosion. So far we have been unable to find a plausible mapping scheme which didn't involve essential backtracking, and potentially exponential complexity. However, there are many ways to prune the search tree of the naive implementation given here, and achieve the same or essentially the same results. We are optimistic that a practical scheme will emerge.
Complete Prolog from above notes.
?- op(700, yfx, ->).
% Index Dependency Analysis
depen(arr(Is, E, Atr), Xs) :-
depen(E, Ys),
union(Ys, Is, Zs), % Make sure all locally bound indices included
atrISet(Atr, Zs),
subtract(Zs, Is, Xs). % Locally bound indices not passed up to parent
depen(idx(X), [X]).
depen(var(_, Es, Atr), Xs) :-
depenList(Es, Xss),
atrSubsISets(Atr, Xss), % Save for future reference
unionList(Xss, Xs),
atrISet(Atr, Xs).
depen(fun(_, Es, Atr), Xs) :-
depenList(Es, Xss),
unionList(Xss, Xs),
atrISet(Atr, Xs).
depenList([], []).
depenList([E | Es], [Xs | Xss]) :-
depen(E, Xs),
depenList(Es, Xss).
% Horizontal Constraints
constrain(arr(Is, E, Atr), T, R, P) :- % T: variable, rank R. P: mappings
atrISet(Atr, Xs), subtract(Xs, Is, Ys),
% `Ys': idxs *not* bound by this arr node
domain(L, Ys), subset(L, P), % `L': map with domain `Ys', contained in `P'
domain(M, Xs), subset(L, M), % `M': map with domain `Xs', containing `L'
atrNodMap(Atr, map(T, R, M)),
constrain(E, T, R, M).
constrain(idx(_), _, _, _).
constrain(var(_, Es, Atr), T, R, P) :-
atrISet(Atr, Xs),
domain(L, Xs), subset(L, P), % `L': map with domain `Xs', contained in `P'
atrDstMap(Atr, map(T, R, L)),
domain(M, Xs),
atrSrcMap(Atr, map(U, E, M)), % `M': map with domain `Xs'
constrainList(Es, U, E, M).
constrain(fun(_, Es, Atr), T, R, P) :-
atrISet(Atr, Xs),
domain(L, Xs), subset(L, P), % `L': map with domain `Xs', contained in `P'
atrDstMap(Atr, map(T, R, L)),
domain(M, Xs),
atrSrcMap(Atr, map(U, E, M)), % `M': map with domain `Xs'
constrainList(Es, U, E, M).
constrainList([], _, _, _).
constrainList([E | Es], T, R, P) :-
constrain(E, T, R, P),
constrainList(Es, T, R, P).
% Variable Reference Mapping.
varMap(arr(_, E, _)) :- varMap(E).
varMap(idx(_)).
varMap(var(V, Es, Atr)) :-
% Attempt ``local reference mapping''...
atrSubsISets(Atr, Xss),
size(Es, R), % Compute rank of `V' from number of subscripts
atrSrcMap(Atr, map(V, R, M)),
mapSingletSubscripts(Xss, M, 1),
varMapList(Es).
varMap(var(_, Es, _)) :-
% ...or leave uninstantiated.
varMapList(Es).
varMap(fun(_, Es, _)) :- varMapList(Es).
varMapList([]).
varMapList([E | Es]) :- varMap(E), varMapList(Es).
mapSingletSubscripts([], _, _).
mapSingletSubscripts([Xs | Xss], M, D) :-
mapSingletSubscript(Xs, M, D),
D1 is D + 1, mapSingletSubscripts(Xss, M, D1).
mapSingletSubscript([X], M, D) :-
% Attempt to map index `X' to dimension `D'...
member(X -> D, M).
mapSingletSubscript(_, _, _).
% ...or leave mapping to `D' uninstantiated.
% Propagation of Mappings.
propagate(arr(_, E, _)) :- propagate(E).
propagate(idx(_)).
propagate(var(_, Es, Atr)) :-
propagateList(Es),
atrDstMap(Atr, MD),
atrSrcMap(Atr, MS),
propagateArc(MD, MS).
propagate(fun(_, Es, Atr)) :-
propagateList(Es),
atrDstMap(Atr, MD),
atrSrcMap(Atr, MS),
propagateArc(MD, MS).
propagateList([]).
propagateList([E | Es]) :- propagate(E), propagateList(Es).
propagateArc(MD, MD). % propagate mapping...
propagateArc(_, _). % ...or don't
% Range Injection.
select(arr(_, E, Atr), CRank) :-
atrNodMap(Atr, map(T, R, M)),
ground(T),
mapAll(M, R, CRank),
select(E, CRank).
select(idx(_), _).
select(var(_, Es, Atr), CRank) :-
atrSrcMap(Atr, map(T, R, M)),
ground(T),
mapAll(M, R, CRank),
selectList(Es, CRank).
select(fun(_, Es, Atr), CRank) :-
atrSrcMap(Atr, map(T, R, M)),
ground(T),
mapAll(M, R, CRank),
selectList(Es, CRank).
selectList([], _).
selectList([E | Es], CRank) :- select(E, CRank), selectList(Es, CRank).
mapAll(M, R, CRank) :-
R < CRank, !,
enum(R, Dims), % enum(N, [1, ..., N]).
inject(M, Dims).
mapAll(M, R, _) :-
enum(R, Dims),
injectOrCollapse(M, Dims).
inject([], _).
inject([_ -> D | M], Range) :-
member(D, Range),
remove(D, Range, Range1),
inject(M, Range1).
injectOrCollapse([], _).
injectOrCollapse([_ -> D | M], Range) :-
member(D, Range),
remove(D, Range, Range1),
injectOrCollapse(M, Range1).
injectOrCollapse([_ -> * | M], Range) :-
injectOrCollapse(M, Range).
%-----------------------------------------------------------------
% Miscellaneous ``set'' operations.
union([], X, X).
union([X | R], Y, Z) :- member(X, Y), !, union(R, Y, Z).
union([X | R], Y, [X | Z]) :- union(R, Y, Z).
subtract(M, [], M).
subtract(M, [X | I], N) :- remove(X, M, P), subtract(P, I, N).
member(X, [X | _]).
member(X, [_ | Y]) :- member(X, Y).
remove(_, [], []).
remove(X, [X | L], L) :- !.
remove(X, [Y | L], [Y | M]) :- remove(X, L, M).
subset([], _).
subset([X | M], N) :- member(X, N), !, subset(M, N).
size([], 0).
size([_ | R], N) :- ground(N), !, N > 0, N1 is N - 1, size(R, N1).
size([_ | R], N) :- size(R, N1), N is N1 + 1.
domain([], []).
domain([X -> _ | M], [X | Xs]) :- domain(M, Xs).
unionList([], []).
unionList([S | Ss], U) :- unionList(Ss, V), union(S, V, U).
enum(N, Is) :- enumFrom(1, N, Is).
enumFrom(I, N, [I | Is]) :- I =< N, !, I1 is I + 1, enumFrom(I1, N, Is).
enumFrom(_, _, []).
%-----------------------------------------------------------------
% Selector Predicates.
atrISet(atr(Xs, _, _, _), Xs).
atrSubsISets(atr(_, Xss, _, _), Xss).
atrNodMap(atr(_, _, M, _), M).
atrSrcMap(atr(_, _, M, _), M).
atrDstMap(atr(_, _, _, M), M).
%-----------------------------------------------------------------
% Run the whole mapping procedure
run(P, CRank) :-
depen(P, _),
constrain(P, _, _, _),
varMap(P),
propagate(P),
select(P, CRank).