Next Previous Contents

11. Arrays

An array is a container object that can contain many values of one data type. Arrays are very useful objects and are indispensable for certain types of programming. The purpose of this chapter is to describe how arrays are defined and used in the S-Lang language.

11.1 Creating Arrays

The S-Lang language supports multi-dimensional arrays of all data types. Since the Array_Type is a data type, one can even have arrays of arrays. To create a multi-dimensional array of SomeType and assign to some variable, use:

      a = SomeType [dim0, dim1, ..., dimN];
Here dim0, dim1, ... dimN specify the size of the individual dimensions of the array. The current implementation permits arrays to contain as many as 7 dimensions. When a numeric array is created, all its elements are initialized to zero. The initialization of other array types depend upon the data type, e.g., the elements in String_Type and Struct_Type arrays are initialized to NULL.

As a concrete example, consider

     a = Integer_Type [10];
which creates a one-dimensional array of 10 integers and assigns it to a. Similarly,
     b = Double_Type [10, 3];
creates a 30 element array of double precision numbers arranged in 10 rows and 3 columns, and assigns it to b.

Range Arrays

There is a more convenient syntax for creating and initializing 1-d arrays. For example, to create an array of ten integers whose elements run from 1 through 10, one may simply use:

     a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
Similarly,
     b = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];
specifies an array of ten doubles.

An even more compact way of specifying a numeric array is to use a range-array. For example,

     a = [0:9];
specifies an array of 10 integers whose elements range from 0 through 9. The syntax for the most general form of range array is given by
     [first-value : last-value : increment]
where the increment is optional and defaults to 1. This creates an array whose first element is first-value and whose successive values differ by increment. last-value sets an upper limit upon the last value of the array as described below.

If the range array [a:b:c] is integer valued, then the interval specified by a and b is closed. That is, the kth element of the array x_k is given by x_k=a+kc and satisfies a<=x_k<=b. Hence, the number of elements in an integer range array is given by the expression 1 + (b-a)/c.

The situation is somewhat more complicated for floating point range arrays. The interval specified by a floating point range array [a:b:c] is semi-open such that b is not contained in the interval. In particular, the kth element of [a:b:c] is given by x_k=a+kc such that a<=x_k<b when c>=0, and b<x_k<=a otherwise. The number of elements in the array is one greater than the largest k that satisfies the open interval constraint.

In contrast, a range-array expressed in the form [a:b:#n] represents an array of exactly n elements running from a to b inclusive. It is equivalent to a+[0:n-1]*(b-a)/(n-1).

Here are a few examples that illustrate the above comments:

       [1:5:1]         ==> [1,2,3,4,5]
       [1.0:5.0:1.0]   ==> [1.0, 2.0, 3.0, 4.0]
       [5:1:-1]        ==> [5,4,3,2,1]
       [5.0:1.0:-1.0]  ==> [5.0, 4.0, 3.0, 2.0];
       [1:1]           ==> [1]
       [1.0:1.0]       ==> []
       [1.0:1.0001]    ==> [1.0]
       [1:-3]          ==> []
       [0:1:#5]        ==> [0.0, 0.25, 0.50, 0.75, 1.0]
       [0:-1:#3]       ==> [0.0, -0.5, -1.0]

Currently Int_Type is the only integer type supported by range arrays--- arbitrary integer types will be supported in a future version. This means that [1h:5h] will not produce an array of Short_Type, rather it will produce an Int_Type array. However, [1h,2h,3h,4h,5h] will produce an array of Short_Type integers.

Creating arrays via the dereference operator

Another way to create an array is to apply the dereference operator @ to the DataType_Type literal Array_Type. The actual syntax for this operation resembles a function call

variable a = @Array_Type (data-type, integer-array);
where data-type is of type DataType_Type and integer-array is a 1-d array of integers that specify the size of each dimension. For example,
     variable a = @Array_Type (Double_Type, [10, 20]);
will create a 10 by 20 array of doubles and assign it to a. This method of creating arrays derives its power from the fact that it is more flexible than the methods discussed in this section. It is particularly useful for creating arrays during run-time in situations where the data-type can vary.

11.2 Reshaping Arrays

It is sometimes useful to change the `shape' of an array using the reshape function. For example, a 1-d 10 element array may be reshaped into a 2-d array consisting of 5 rows and 2 columns. The only restriction on the operation is that the arrays must be commensurate. The reshape function follows the syntax

reshape (array-name, integer-array);
where array-name specifies the array to be reshaped to the dimensions given by integer-array, a 1-dimensional array of integers. It is important to note that this does not create a new array, it simply reshapes the existing array. Thus,
     variable a = Double_Type [100];
     reshape (a, [10, 10]);
turns a into a 10 by 10 array, as well as any other variables attached to the array.

The _reshape function works like reshape except that it creates a new array instead of changing the shape of an existing array:

     new_a = _reshape (a, [10,10]);

11.3 Simple Array Indexing

An individual element of an array may be referred to by its index. For example, a[0] specifies the zeroth element of the one dimensional array a, and b[3,2] specifies the element in the third row and second column of the two dimensional array b. As in C, array indices are numbered from 0. Thus if a is a one-dimensional array of ten integers, the last element of the array is given by a[9]. Using a[10] would result in an IndexError exception.

A negative index may be used to index from the end of the array, with a[-1] referring to the last element of a. Similarly, a[-2] refers to the next to the last element, and so on.

One may use the indexed value like any other variable. For example, to set the third element of an integer array to 6, use

     a[2] = 6;
Similarly, that element may be used in an expression, such as
     y = a[2] + 7;
Unlike other S-Lang variables which inherit a type upon assignment, array elements already have a type and any attempt to assign a value with an incompatible type will result in a TypeMismatchError exception. For example, it is illegal to assign a string value to an integer array.

One may use any integer expression to index an array. A simple example that computes the sum of the elements of a 10 element 1-d array is

      variable i, s;
      s = 0;
      for (i = 0; i < 10; i++) s += a[i];
(In practice, do not carry out sums this way--- use the sum function instead, which is much simpler and faster, i.e., s=sum(a)).

11.4 Indexing Multiple Elements with Ranges

Unlike many other languages, S-Lang permits arrays to be indexed by other integer arrays. Suppose that a is a 1-d array of 10 doubles. Now consider:

      i = [6:8];
      b = a[i];
Here, i is a 1-dimensional range array of three integers with i[0] equal to 6, i[1] equal to 7, and i[2] equal to 8. The statement b = a[i]; will create a 1-d array of three doubles and assign it to b. The zeroth element of b, b[0] will be set to the sixth element of a, or a[6], and so on. In fact, these two simple statements are equivalent to
     b = Double_Type [3];
     b[0] = a[6];
     b[1] = a[7];
     b[2] = a[8];
except that using an array of indices is not only much more convenient, but executes much faster.

More generally, one may use an index array to specify which elements are to participate in a calculation. For example, consider

     a = Double_Type [1000];
     i = [0:499];
     j = [500:999];
     a[i] = -1.0;
     a[j] = 1.0;
This creates an array of 1000 doubles and sets the first 500 elements to -1.0 and the last 500 to 1.0. Actually, one may do away with the i and j variables altogether and use
     a = Double_Type [1000];
     a[[0:499]] = -1.0;
     a[[500:999]] = 1.0;
It is important to note that the syntax requires the use of the double square brackets, and in particular that a[[0:499]] is not the same as a[0:499]. In fact, the latter will generate a syntax error.

Index-arrays are not contrained to be one-dimensional arrays. Suppose that I represents a multidimensional index array, and that A is the array to be indexed. Then what does A[I] represent? Its value will be an array of the same type as A, but with the dimensionality of I. For example,

    a = 1.0*[1:10];
    i = _reshape ([4,5,6,7,8,9], [2,3]);
defines a to be a 10 element array of doubles, and i to be 2x3 array of integers. Then a[i] will be a 2x3 array of doubles with elements:
    a[4]   a[5]   a[6]
    a[7]   a[8]   a[9]

Often, it is convenient to use a ``rubber'' range to specify indices. For example, a[[500:]] specifies all elements of a whose index is greater than or equal to 500. Similarly, a[[:499]] specifies the first 500 elements of a. Finally, a[[:]] specifies all the elements of a. The latter form may also be written as a[*].

One should be careful when using index arrays with negative elements. As pointed out above, a negative index is used to index from the end of the array. That is, a[-1] refers to the last element of a. How should a[[[0:-1]] be interpreted?

In version 1 of the interpreter, when used in an array indexing context, a construct such as [0:-1] was taken to mean from the first element through the last. While this might seem like a convenient shorthand, in retrospect it was a bad idea. For this reason, the meaning of a ranges over negative valued indices was changed in version 2 of the interpreter as follows: First the index-range gets expanded to an array of indices according to the rules for range arrays described above. Then if any of the resulting indices are negative, they are interpreted as indices from the end of the array. For example, if a is an array of 10 elements, then a[[-2:3]] is first expanded to a[[-2,-1,0,1,2,3]], and then to the 6 element array

    [ a[8], a[9], a[0], a[1], a[2], a[3] ]
So, what does a[[0:-1]] represent in the new interpretation? Since [0:-1] expands to an empty array, a[[0:-1]] will also produce an empty array.

Indexing of multidimensional arrays using ranges works similarly. Suppose a is a 100 by 100 array of doubles. Then the expression a[0, *] specifies all elements in the zeroth row. Similarly, a[*, 7] specifies all elements in the seventh column. Finally, a[[3:5],[6:12]] specifies the 3 by 7 region consisting of rows 3, 4, and 5, and columns 6 through 12 of a.

Before leaving this section, a few examples are presented to illustrate some of these points.

The ``trace'' of a matrix is an important concept that occurs frequently in linear algebra. The trace of a 2d matrix is given by the sum of its diagonal elements. Consider the creation of a function that computes the trace of such a matrix.

The most straightforward implementation of such a function uses an explicit loop:

      define array_trace (a, n)
      {
         variable s = 0, i;
         for (i = 0; i < n; i++) s += a[i, i];
         return s;
      }
Better yet is to recognize that the diagonal elements of an n by n array are given by an index array I with elements 0, n+1, 2*n+2, ..., n*n-1, or more precisely as
     [0:n*n-1:n+1]
Hence the above may be written more simply as
     define array_trace (a, n)
     {
        return sum (a[[0:n*n-1:n+1]]);
     }

The following example creates a 10 by 10 integer array, sets its diagonal elements to 5, and then computes the trace of the array:

      a = Integer_Type [10, 10];
      a[[0:99:11]] = 5;
      the_trace = array_trace(a, 10);

In the previous examples, the size of the array was passed as an additional argument. This is unnecessary because the size may be obtained from array itself by using the array_shape function. For example, the following function may be used to obtain the indices of the diagonal element of an array:

     define diag_indices (a)
     {
        variable dims = array_shape (a);
        if (length (dims) != 2)
          throw InvalidParmError, "Expecting a 2d array";
        if (dims[0] != dims[1])
          throw InvalidParmError, "Expecting a square array";
        variable n = dims[0];
        return [0:n*(n-1):n+1];
     }
Using this function, the trace function may be written more simply as
     define array_trace (a)
     {
        return sum (a[diag_indices(a)]);
     }
Another example of this technique is a function that creates an n by n unit matrix:
      define unit_matrix (n)
      {
         variable a = Int_Type[n, n];
         a[diag_indices(a)] = 1;
         return a;
      }

11.5 Arrays and Variables

When an array is created and assigned to a variable, the interpreter allocates the proper amount of space for the array, initializes it, and then assigns to the variable a reference to the array. So, a variable that represents an array has a value that is really a reference to the array. This has several consequences, most good and some bad. It is believed that the advantages of this representation outweigh the disadvantages. First, we shall look at the positive aspects.

When a variable is passed to a function, it is always the value of the variable that gets passed. Since the value of a variable representing an array is a reference, a reference to the array gets passed. One major advantage of this is rather obvious: it is a fast and efficient way to pass the array. This also has another consequence that is illustrated by the function

      define init_array (a)
      {
         variable i;
         variable n = length(a);
         _for i (0, n-1, 1)
           a[i] = some_function (i);
      }
where some_function is a function that generates a scalar value to initialize the ith element. This function can be used in the following way:
      variable X = Double_Type [100000];
      init_array (X);
Since the array is passed to the function by reference, there is no need to make a separate copy of the 100000 element array. As pointed out above, this saves both execution time and memory. The other salient feature to note is that any changes made to the elements of the array within the function will be manifested in the array outside the function. Of course, in this case this is a desirable side-effect.

To see the downside of this representation, consider:

      a = Double_Type [10];
      b = a;
      a[0] = 7;
What will be the value of b[0]? Since the value of a is really a reference to the array of ten doubles, and that reference was assigned to b, b also refers to the same array. Thus any changes made to the elements of a, will also be made implicitly to b.

This begs the question: If the assignment of a variable attached to an an array to another variable results in the assignment of the same array, then how does one make separate copies of the array? There are several answers including using an index array, e.g., b = a[*]; however, the most natural method is to use the dereference operator:

      a = Double_Type [10];
      b = @a;
      a[0] = 7;
In this example, a separate copy of a will be created and assigned to b. It is very important to note that S-Lang never implicitly dereferences an object. So, one must explicitly use the dereference operator. This means that the elements of a dereferenced array are not themselves dereferenced. For example, consider dereferencing an array of arrays, e.g.,
      a = Array_Type [2];
      a[0] = Double_Type [10];
      a[1] = Double_Type [10];
      b = @a;
In this example, b[0] will be a reference to the array that a[0] references because a[0] was not explicitly dereferenced.

11.6 Using Arrays in Computations

Many functions and operations work transparently with arrays. For example, if a and b are arrays, then the sum a + b is an array whose elements are formed from the sum of the corresponding elements of a and b. A similar statement holds for all other binary and unary operations.

Let's consider a simple example. Suppose, that we wish to solve a set of n quadratic equations whose coefficients are given by the 1-d arrays a, b, and c. In general, the solution of a quadratic equation will be two complex numbers. For simplicity, suppose that all we really want is to know what subset of the coefficients, a, b, c, correspond to real-valued solutions. In terms of for loops, we can write:

     index_array = Char_Type [n];
     _for i (0, n-1, 1)
       {
          d = b[i]^2 - 4 * a[i] * c[i];
          index_array [i] = (d >= 0.0);
       }
In this example, the array index_array will contain a non-zero value if the corresponding set of coefficients has a real-valued solution. This code may be written much more compactly and with more clarity as follows:
     index_array = ((b^2 - 4 * a * c) >= 0.0);
Moreover, it executes about 20 times faster than the version using an explicit loop.

S-Lang has a powerful built-in function called where. This function takes an array of boolean values and returns an array of indices that correspond to where the elements of the input array are non-zero. The utility of this simple operation cannot be overstated. For example, suppose a is a 1-d array of n doubles, and it is desired to set all elements of the array whose value is less than zero to zero. One way is to use a for loop:

     _for i (0, n-1, 1)
       if (a[i] < 0.0) a[i] = 0.0;
If n is a large number, this statement can take some time to execute. The optimal way to achieve the same result is to use the where function:
     a[where (a < 0.0)] = 0;
Here, the expression (a < 0.0) returns a boolean array whose dimensions are the same size as a but whose elements are either 1 or 0, according to whether or not the corresponding element of a is less than zero. This array of zeros and ones is then passed to the where function, which returns a 1-d integer array of indices that indicate where the elements of a are less than zero. Finally, those elements of a are set to zero.

Consider once more the example involving the set of n quadratic equations presented above. Suppose that we wish to get rid of the coefficients of the previous example that generated non-real solutions. Using an explicit for loop requires code such as:

     nn = 0;
     _for i (0, n-1, 1)
       if (index_array [i]) nn++;

     tmp_a = Double_Type [nn];
     tmp_b = Double_Type [nn];
     tmp_c = Double_Type [nn];

     j = 0;
     _for i (0, n-1, 1)
       {
          if (index_array [i])
            {
               tmp_a [j] = a[i];
               tmp_b [j] = b[i];
               tmp_c [j] = c[i];
               j++;
            }
       }
     a = tmp_a;
     b = tmp_b;
     c = tmp_c;
Not only is this a lot of code, making it hard to digest, but it is also clumsy and error-prone. Using the where function, this task is trivial and executes in a fraction of the time:
     i = where (index_array != 0);
     a = a[i];
     b = b[i];
     c = c[i];

Most of the examples up till now assumed that the dimensions of the array were known. Although the intrinsic function length may be used to get the total number of elements of an array, it cannot be used to get the individual dimensions of a multi-dimensional array. The array_shape function may be used to determine the dimensionality of an array. It may be used to determine the number of rows of an array as follows:

     define num_rows (a)
     {
        return array_shape (a)[0];
     }
The number of columns may be obtained in a similar manner:
     define num_cols (a)
     {
        variable dims = array_shape (a);
        if (length(dims) > 1) return dims[1];
        return 1;
     }
The array_shape function may also be used to create an array that has the same number of dimensions as another array:
     define make_int_array (a)
     {
        return @Array_Type (Int_Type, array_shape (a));
     }

Finally, the array_info function may be used to get additional information about an array, such as its data type and size.

11.7 Arrays of Arrays: A Cautionary Note

Sometimes it is desirable to create an array of arrays. For example,

     a = Array_Type[3];
     a[0] = [1:10];
     a[1] = [1:100];
     a[2] = [1:1000];
will produce an array of the 3 arrays [1:10], [1:100], and [1:1000]. Index arrays may be used to access elements of an array of arrays: a[[1,2]] will produce an array of arrays that consists of the elements a[1] and a[2]. However, it is important to note that setting the elements of an array of arrays via an index array does not work as one might naively expect. Consider the following:
     b = Array_Type[3];
     b[*] = a[[2,1,0]];
where a is the array of arrays given in the previous example. The reader might expect b to have elements b[0]=a[2], b[1]=a[1], and b[2]=a[0], and be surprised to learn that b[0]=b[1]=b[2]=a[[2,1,0]]. The reason for this is that, by definition, b is an array of arrays, and even though a[[2,1,0]] is an array of arrays, it is first and foremost an array, and it is that array that is assigned to the elements of b.


Next Previous Contents