1. Data Parallelism in Fortran 90

Introduction

Fortran is commonly derided by computer scientists. Being one of the earliest high-level languages to be developed (by John Backus of IBM in the mid 1950's), it certainly contains some archaic features that have fallen into disrepute in more recent times. However, Fortran is constantly evolving, via a succession of formal ANSI & ISO standards (Fortran 66, 77 and 90), as well as by vendor extensions, which subsequently provide input to the standardisation process.

Fortran 90 (which, incidentally, is now properly called just `Fortran') introduces a large number of new features and facilities to Fortran, as well as cleaning-up its syntactic and lexical form in various respects. It contains most of the features expected of a modern imperative programming language, as well as some not commonly found in other mainstream languages (in particular, its array syntax, which we shall concentrate on here). It is therefore possible to write clean, concise, elegant programs in Fortran. In addition, a primary objective of Fortran is that it should be efficiently executable on modern high-performance supercomputers (e.g. vector and SIMD architectures). This, and the large body of existing Fortran programs, explains why Fortran continues to be a popular language for large-scale scientific and engineering applications.

It is true that much existing Fortran software, having been developed over many years, contains many old deprecated features from early Fortran (so-called 'dusty deck' Fortran). Fortran 90 did not remove anything that was in Fortran 77, mainly for this reason, but it did mark some of these old features as `obsolescent', i.e. candidates for deletion from a future Fortran standard. Many of these will indeed be deleted in the next revision of Fortran in 1995.

Fortran 90 contains no explicitly parallel constructs or features (e.g. message-passing or task parallelism). However, it does contain features that implicitly express data parallelism, namely its array features. This lecture will not attempt to cover all of Fortran, but instead will concentrate on its data parallel features. The reader interested in a complete description of Fortran 90 is referred to the growing number of textbooks on the language (e.g. Michael Metcalf and John Reid, `Fortran 90 Explained', Oxford Science Publications, 1993, ISBN 0-19-853772-7).

More recently (in 1992-3) an informal standard called `High Performance Fortran' has been developed. This comprises a set of extensions to Fortran 90 to assist its implementation on parallel architectures, especially distributed memory message-passing systems. In particular, it includes directives for expressing data distribution across multiple processor memories. This will be the subject of the lecture after this.

Data parallelism

This is the application of an operation to multiple data items in parallel.

E.g., given the following declaration of 3 arrays of type REAL:

    REAL A (10), B (10), C (10)
the following array assignment statement:
    A = B + C
is equivalent to the set of 10 `elemental' assignment statements:
    A (1) = B (1) + C (1)
    A (2) = B (2) + C (2)
    ...
    A (10) = B (10) + C (10)
These elemental assignments can be performed in any order, and hence in parallel. (Incidentally, array indices are called subscripts in Fortran. Subscript values start at 1 unless otherwise specified).

Some terminology for arrays

rank:
number of dimensions;
vector:
a rank 1 array;
shape:
a vector listing the number of elements in each dimension of an array.
E.g:
    INTEGER  I (10)           ! rank 1, shape (10);  elements I(1)...I(10)
    REAL  A (8, 8), B (8, 8)  ! rank 2, shape (8,8)
    REAL  C (0:7, 0:7)        ! rank 2, shape (8,8)
('!' introduces a comment). In the last example we have explicitly specified the lower bound of each dimension to be 0, i.e. subscript values in each dimension are in the range 0...7. If not specified, the default value of the lower bound is 1.

Arrays are said to conform if they have the same shape, regardless of their bounds. E.g. In the above example, arrays A, B and C all have shape (8,8) and so conform.

Fortran 90 allows arrays to be used in expressions and assignments, provided they conform. In array expressions and assignments, elements correspond by the position of their subscripts in their subscript ranges rather than by subscript values. E.g., with the above declarations:

    A = B + C
is equivalent to:
    A (1,1) = B (1,1) + C (0,0)
    A (2,1) = B (2,1) + C (1,0)
    A (3,1) = B (3,1) + C (2,0)
    ...
    A (8,8) = B (8,8) + C (7,7)

A scalar (i.e. a data item that is not an array) can appear in an array expression, and conforms with any array. It is equivalent to `broadcasting' the scalar value to an array of the required shape. E.g.:

    INTEGER  A (10), B (0:9)

    A = 2 * B
means:
    A (1) = 2 * B (0)
    A (2) = 2 * B (1)
    ...
    A (10) = 2 * B (9)

Generally, in an assignment:

var = expr
the expr must conform with the var; i.e. the primaries in expr must have the same shape as var or be scalar.

Array sections

One can also reference a subset of the elements of an array, called an array section.

Regular section

A regular subset of elements, specified using a subscript triplet instead of a normal subscript. This has the form:

[l] : [u] [: s]
where l, u and s are expressions for: l = lower bound, u = upper bound, s = stride and [...] denotes an optional item. (N.B. [...] is not part of the syntax!).

E.g.:

    A (4:10)
is an array section composed of the sequence of elements:
    (/ A(4), A(5), ... A(10) /)
in this order. (N.B. (/ e1, e2, ...en /), where e1...en are scalar expressions, is an `array constructor', which creates a vector with element values e1...en.)
    A (4:10:2)  means  (/ A(4), A(6), A(8), A(10) /)
i.e. the elements of A from 4 to 10 with stride 2.

The stride can be negative:

    A (10:4:-2)  means  (/ A(10), A(8), A(6), A(4) /)
Any of l, u and `: s' can be omitted. Their defaults are:
l = lower bound of whole dimension
u = upper bound of whole dimension
s = 1
Thus, with declaration REAL A (10):
    A (:4)   means  (/ A(1), A(2), A(3), A(4) /)
    A (::2)  means  (/ A(1), A(3), A(5), A(7), A(9) /)
    A (:)    means the whole dimension (i.e. A itself)

Irregular section

An irregular subset of elements, specified using a vector subscript. E.g. If V is a vector of length 3, whose elements have the values (/5, 1, 2/) then A (V) means an array composed of the sequence of elements:

        (/ A(5), A(1), A(2) /)
in this order. Elements may be repeated in an irregular section.

A multi-dimensional array reference can contain different types of subscript in different dimensions. E.g.:

    B (3:9:2, 2, V)

Shape of an array section

The shape of an array section is given by the number of subscript values selected in each dimension with a subscript triplet or a vector subscript. Dimensions with just a scalar subscript are omitted in the determination of the rank and shape. E.g. consider the section:

    B (3:9:2, 2, V)
where V is a vector of length 3, say. Then:
    3:9:2  -> 4 subscript values (3,5,7,9)
    2      -> scalar subscript -- ignore it.
    V      -> vector of length 3, so 3 subscript values.
so its shape is (4,3), and it has rank 2.

Another example:

    INTEGER  V (2),  A (10,10,10)

    V = (/ 8, 1 /)

    ... A (3:5:2, 2, V) ...
is a section with shape (2,2). Its elements are:
    [ A (3,2,8)   A (3,2,1) ]
    [ A (5,2,8)   A (5,2,1) ]

Semantics of array assignment

In:

var = expr
the semantics are that expr is fully evaluated for all subscript values before any assignment takes place. Thus, if expr references array elements that are also in var, then expr always uses old values of the elements.

E.g.:

    A (2:3) = A (1:2)
is equivalent to:
    TEMP (1) = A (1)
    TEMP (2) = A (2)
    A (2)    = TEMP (1)
    A (3)    = TEMP (2)  ! gets *old* value of A(2), not the new one.
Thus, the new value of A is:
    (/ old A(1),  old A(1),  old A(2), ... /)
Note, this is not the same as the similar-looking DO-loop:
    DO i = 2,3
      A (i) = A (i-1)
    ENDDO
The semantics of a DO-loop are that its iterations are performed in sequence (unless analysis can show that another order is equivalent), so this DO-loop results in:
    A (2) = A (1)
then
    A (3) = A (2)   ! gets *new* value of A(2) == old value of A(1)
so the new value of A is:
    (/ old A(1),  old A(1),  old A(1), ... /)

Because of the semantics of array expressions and assignment, they are inherently data parallel: all elements of an array expression can be evaluated in parallel, and then all of the results can be assigned in parallel.

As another example:

    A (2:9) = 0.5 * (A (1:8) + A (3:9))
assigns to each element A(2) to A(9) the average of the old values of its nearest neighbours.

Logical array expressions

Relational operators:

    >    is greater that
    >=   is greater than or equal to
    <    is less than
    <=   is less than or equal to
    ==   equals
    /=   does not equal
These can be used in array expressions to provide a LOGICAL array result. E.g. if
    V = (/ 1, 0, -5, 3 /)

    (V > 0)  gives the logical array (/ .TRUE., .FALSE., .FALSE., .TRUE. /)
(.TRUE. and .FALSE. are the logical constants in Fortran). A logical array is often called a `mask' array.

Logical operators:

Fortran also has the logical operators .AND., .OR., .EQV. (equivalent) and .NEQV. (not equivalent).

WHERE statement and construct

One can mask an array assignment, so that the expression evaluation and assignment is only performed for certain elements, using a WHERE statement. E.g:

    REAL A (10), B (10)

    WHERE (A > 0.0)  B = 1.0 / A
only performs
    B(i) = 1.0 / A(i)
for subscripts `i' such that (A(i) > 0.0)

A WHERE-ELSEWHERE construct is also provided, to allow a single mask to control a sequence of assignments. E.g.:

    WHERE (A > 0.0)
      B = 1.0 / A
    ELSEWHERE
      B = 0.0
    ENDWHERE
Each branch can contain any number of array assignments, which must all conform with the mask expression. The assignments are performed in sequence. The ELSEWHERE part is optional.

Intrinsic functions

These are functions defined as part of the Fortran language. There are a large number of them (114 or 161, depending on the counting method!).

(a) Elemental intrinsic functions

These are defined for scalar arguments and return a scalar result.

However, they can also take an array argument of any shape, in which case they return an array result of the same shape, each element of which has the value that would be obtained by applying the function to the corresponding element of the argument.

Most of the intrinsic functions are of this nature, e.g. the mathematical intrinsic functions like SIN, COS, TAN:

    REAL A (10), B (10)

    A = SIN (B)
means:
    A (1) = SIN (B (1))
    A (2) = SIN (B (2))
    ...
    A (10) = SIN (B (10))

(b) Inquiry intrinsic functions

These are used to enquire about the properties of arrays, E.g. with REAL A (8,16):

    SIZE (A,1)    ! returns size of A in dimension 1, i.e. 8
    SIZE (A,2)    !    "       "    "       "      2, i.e. 16
    SIZE (A)      !    "    total number of elements in A, i.e. 128

    SHAPE (A)     ! returns shape of A, i.e. (/8, 16/)

    LBOUND (A,1)  ! returns the lower bound of A in dimension 1, i.e. 1

    UBOUND (A,1)  ! returns the upper bound of A in dimension 1, i.e. 1

(c) Reduction intrinsic functions

These take an array of any shape, and combine its elements under a particular operator to return an array of lower rank or a single element (`array reduction'). E.g., with REAL A (8,16)

    SUM (A)       ! returns sum of all elements of A
    SUM (A, 1)    ! applies `+' in dim 1:  SUM A(i,j);  result shape (16)
                  !                         i
    SUM (A, 2)    ! applies `+' in dim 2:  SUM A(i,j);  result shape (8)
                  !                         j
Similarly:
    PRODUCT (A)
    MAXVAL (A)
    MINVAL (A)

    ALL (logical-array-expr)  ! returns .TRUE. if all elements are .TRUE.,
                              ! .FALSE. otherwise (i.e. operator is .AND.)

    ANY (logical-array-expr)  ! returns .TRUE. if any element is .TRUE.,
                              ! .FALSE. otherwise (i.e. operator is .OR.)
All of these functions take an optional second `dimension' argument, like SUM.

Examples

(i) Array functions

User-defined functions in Fortran can return array results. The following is a function that returns the matrix product of an (m * n) matrix A with a vector V of length (n). (In fact there is also an intrinsic function MATMUL that performs this task!).

    FUNCTION matvec (A, V, m, n)
      INTEGER  m, n
      REAL  A (m,n), V (n), matvec (m)

      ! Returns  matvec, s.t. matvec (i) = SUM A(i,j) * V(j)
      !                                     j

      DO i = 1,m
        matvec (i) = SUM ( A(i,:) * V(:))
      ENDDO
    END FUNCTION
This array-valued function can of course be used in an array expression, e.g.:
    REAL  A (10,10), V (10)        ! ...and initialise them!

    V = matvec (A, V, 10, 10)

(ii) Solving Laplace's equation by Jacobi iteration

For the mathematically-minded, Laplace's equation is:

del-squared phi = 0

though this is unimportant for the purpose of understanding the example!

Here we consider Laplace's equation in 2 dimensions. This can be solved by discretising the field phi on a 2-d grid, i.e. by representing it as a 2d array of values (which we hereafter call A!). The boundary values of A are given, and the rest (the interior values) have to be computed.

Jacobi's solution method is basically:

The program is:

    PROGRAM jacobi
      REAL  a (100,100),  old_a (100,100)

      ...  set boundary values of A as required
      a (2:99, 2:99) = 0.0      ! initialise interior of A to 0
      old_a = 0.0               !     "      all of OLD_A to 0

      DO WHILE ( ANY (a - old_a > 1.0E-07 * a) )
        old_a = a
        a (2:99, 2:99) = 0.25 * (a (1:98, 2:99) + a (3:100, 2:99)  &
                               + a (2:99, 1:98) + a (2:99, 3:100))
      ENDDO
    END PROGRAM

(iii) Solving Laplace's equation by red-black relaxation

Red-black relaxation is a refinement of Jacobi iteration (example (ii)) which converges faster. The grid of points A is partitioned into a chequerboard of `even' and `odd' (or `red' and `black') points according to whether the sum of their subscript values is even or odd. It can be seen that all of the nearest neighbours of an even point are odd, and vice versa.

Each iteration is divided into 2 steps:

  1. every even point is updated (using the values at the odd points);
  2. similarly for the odd points.

The code is similar to (ii), except that a WHERE-ELSEWHERE construct is used to mask the updates so that they are applied to just even or odd points as appropriate.

    PROGRAM red_black
      REAL  a (100,100),  old_a (100,100)
      LOGICAL  even (100,100)

      ...  set boundary values of A as required
      a (2:99, 2:99) = 0.0      ! initialise interior of A to 0
      old_a = 0.0               !     "      all of OLD_A to 0

    ! initialise mask array EVEN
      DO i = 1,100
        DO j = 1,100
          even (i,j) = (MOD (i+j,2) == 0)
        ENDDO
      ENDDO

      DO WHILE ( ANY (a - old_a > 1.0E-07 * a) )
        old_a = a

    ! update even points

        WHERE (even (2:99, 2:99))
          a (2:99, 2:99) = 0.25 * (a (1:98, 2:99) + a (3:100, 2:99)  &
                                 + a (2:99, 1:98) + a (2:99, 3:100))
    ! update odd points

           ELSEWHERE
          a (2:99, 2:99) = 0.25 * (a (1:98, 2:99) + a (3:100, 2:99)  &
                                 + a (2:99, 1:98) + a (2:99, 3:100))
        ELSEWHERE
      ENDDO
    END PROGRAM


Continue in sequence, or back to the index.
Written by John Merlin (jhm@vcpc.univie.ac.at).
Converted to HTML by Simeon Warner on May 23 1994.