Next Previous Contents

13. Structures and User-Defined Types

Struct_Type

A structure is a heterogeneous container object, i.e., it is an object with elements whose values do not have to be of the same data type. The elements or fields of a structure are named, and one accesses a particular field of the structure via the field name. This should be contrasted with an array whose values are of the same type, and whose elements are accessed via array indices.

A user-defined data type is a structure with a fixed set of fields defined by the user.

13.1 Defining a Structure

The struct keyword is used to define a structure. The syntax for this operation is:

struct {field-name-1, field-name-2, ... field-name-N};
This creates and returns a structure with N fields whose names are specified by field-name-1, field-name-2, ..., field-name-N. When a structure is created, the values of its fields are initialized to NULL.

For example,

     variable t = struct { city_name, population, next };
creates a structure with three fields and assigns it to the variable t.

Alternatively, a structure may be created by dereferencing Struct_Type. Using this technique, the above structure may be created using one of the two forms:

      t = @Struct_Type ("city_name", "population", "next");
      t = @Struct_Type (["city_name", "population", "next"]);
This approach is useful when creating structures dynamically where one does not know the name of the fields until run-time.

Like arrays, structures are passed around by reference. Thus, in the above example, the value of t is a reference to the structure. This means that after execution of

     u = t;
both t and u refer to the same underlying structure, since only the reference was copied by the assignment. To actually create a new copy of the structure, use the dereference operator, e.g.,
     variable u = @t;
It create new structure whose field names are identical to the old and copies the field values to the new structure. If any of the values are objects that are passed by reference, then only the references will be copied. In other words,
      t = struct{a};
      t.a = [1:10];
      u = @t;
will produce a structure u that references the same array as t.

13.2 Accessing the Fields of a Structure

The dot (.) operator is used to specify the particular field of structure. If s is a structure and field_name is a field of the structure, then s.field_name specifies that field of s. This specification can be used in expressions just like ordinary variables. Again, consider

     t = struct { city_name, population, next };
described in the last section. Then,
     t.city_name = "New York";
     t.population = 13000000;
     if (t.population > 200) t = t.next;
are all valid statements involving the fields of t.

13.3 Linked Lists

One of the most important uses of structures is the creation of dynamic data structures such as linked-lists. A linked-list is simply a chain of structures that are linked together such that one structure in the chain is the value of a field of the previous structure in the chain. To be concrete, consider the structure discussed earlier:

     t = struct { city_name, population, next };
and suppose that it is desired to create a linked-list of such objects to store population data. The purpose of the next field is to provide the link to the next structure in the chain. Suppose that there exists a function, read_next_city, that reads city names and populations from a file. Then the list may be created using:
     define create_population_list ()
     {
        variable city_name, population, list_root, list_tail;
        variable next;

        list_root = NULL;
        while (read_next_city (&city_name, &population))
          {
             next = struct {city_name, population, next };

             next.city_name = city_name;
             next.population = population;
             next.next = NULL;

             if (list_root == NULL)
               list_root = next;
             else
               list_tail.next = next;

             list_tail = next;
          }
        return list_root;
     }
In this function, the variables list_root and list_tail represent the beginning and end of the list, respectively. As long as read_next_city returns a non-zero value, a new structure is created, initialized, and then appended to the list via the next field of the list_tail structure. On the first time through the loop, the list is created via the assignment to the list_root variable.

This function may be used as follows:

    Population_List = create_population_list ();
    if (Population_List == NULL)
      throw RunTimeError, "List is empty";
Other functions may be created that manipulate the list. Here is one that finds the city with the largest population:
    define get_largest_city (list)
    {
       variable largest;

       largest = list;
       while (list != NULL)
         {
            if (list.population > largest.population)
              largest = list;
            list = list.next;
         }
       return largest.city_name;
    }

    vmessage ("%s is the largest city in the list",
               get_largest_city (Population_List));
The get_largest_city is a typical example of how one traverses a linear linked-list by starting at the head of the list and successively moves to the next element of the list via the next field.

In the previous example, a while loop was used to traverse the linked list. It is also possible to use a foreach loop for this:

    define get_largest_city (list)
    {
       variable largest, elem;

       largest = list;
       foreach elem (list)
         {
            if (elem.population > largest.population)
              largest = elem;
         }
       return largest.city_name;
    }
Here a foreach loop has been used to walk the list via its next field. If the field name linking the elements was not called next, then it would have been necessary to use the using form of the foreach statement. For example, if the field name implementing the linked list was next_item, then
     foreach item (list) using ("next_item")
       {
          .
          .
       }
would have been used. In other words, unless otherwise indicated via the using clause, foreach walks the list using a field named next by default.

Now consider a function that sorts the list according to population. To illustrate the technique, a bubble-sort will be used, not because it is efficient (it is not), but because it is simple, intuitive, and provides another example of structure manipulation:

    define sort_population_list (list)
    {
       variable changed;
       variable node, next_node, last_node;
       do
         {
            changed = 0;
            node = list;
            next_node = node.next;
            last_node = NULL;
            while (next_node != NULL)
              {
                 if (node.population < next_node.population)
                   {
                      % swap node and next_node
                      node.next = next_node.next;
                      next_node.next = node;
                      if (last_node != NULL)
                        last_node.next = next_node;

                      if (list == node) list = next_node;
                      node = next_node;
                      next_node = node.next;
                      changed++;
                   }
                 last_node = node;
                 node = next_node;
                 next_node = next_node.next;
              }
         }
       while (changed);

       return list;
    }
Note the test for equality between list and node, i.e.,
                      if (list == node) list = next_node;
It is important to appreciate the fact that the values of these variables are references to structures, and that the comparison only compares the references and not the actual structures they reference. If it were not for this, the algorithm would fail.

13.4 Defining New Types

A user-defined data type may be defined using the typedef keyword. In the current implementation, a user-defined data type is essentially a structure with a user-defined set of fields. For example, in the previous section a structure was used to represent a city/population pair. We can define a data type called Population_Type to represent the same information:

      typedef struct
      {
         city_name,
         population
      } Population_Type;
This data type can be used like all other data types. For example, an array of Population_Type types can be created,
      variable a = Population_Type[10];
and `populated' via expressions such as
      a[0].city_name = "Boston";
      a[0].population = 2500000;
The new type Population_Type may also be used with the typeof function:
      if (Population_Type == typeof (a))
        city = a.city_name;
The dereference @ may be used to create an instance of the new type:
     a = @Population_Type;
     a.city_name = "Calcutta";
     a.population = 13000000;

Another feature that user-defined types possess is that the action of the binary and unary operations may be defined for them. This idea is discussed in more detail below.

13.5 Operator Overloading

The binary and unary operators may be extended to user-defined types. To illustrate how this works, consider a data type that represents a vector in 3-space:

    typedef struct { x, y, z } Vector_Type;
and a function that instantiates such an object:
    define vector_new (x, y, z)
    {
       variable v = @Vector_Type;
       v.x = double(x); v.y = double(y); v.z = double(z);
       return v;
    }
This function may be used to define a function that adds two vectors together:
    define vector_add (v1, v2)
    {
       return vector_new (v1.x+v2.x, v1.y+v2.y, v1.z+v2.z);
    }
Using these functions, three vectors representing the points (2,3,4), (6,2,1), and (-3,1,-6) may be created using
   V1 = vector_new (2,3,4);
   V2 = vector_new (6,2,1);
   V3 = vector_new (-3,1,-6);
and then added together via
   V4 = vector_add (V1, vector_add (V2, V3));
The problem with the last statement is that it is not a very natural way to express the addition of three vectors. It would be far better to extend the action of the binary + operator to the Vector_Type objects and then write the above sum more simply as
   V4 = V1 + V2 + V3;

The __add_binary function defines the result of a binary operation between two data types:

__add_binary (op, result-type, funct, typeA,typeB);
Here, op is a string representing any one of the binary operators ("+", "-", "*", "/", "==",...), and funct is reference to a function that carries out the binary operation between objects of types typeA and typeB to produce an object of type result-type.

This function may be used to extend the + operator to Vector_Type objects:

    __add_binary ("+", Vector_Type, &vector_add, Vector_Type, Vector_Type);
Similarly the subtraction and equality operators may be extended to Vector_Type via
    define vector_minus (v1, v2)
    {
       return vector_new (v1.x-v2.x, v1.y-v2.y, v1.z-v2.z);
    }
    __add_binary ("-", Vector_Type, &vector_minus, Vector_Type, Vector_Type);

    define vector_eqs (v1, v2)
    {
       return (v1.x==v2.x) and (v1.y==v2.y) and (v1.z==v2.z);
    }
    __add_binary ("==", Char_Type, &vector_eqs, Vector_Type, Vector_Type);
permitting a statement such as
    if (V2 != V1) V3 = V2 - V1;
The - operator is also an unary operator that is customarily used to change the sign of an object. Unary operations may be extended to Vector_Type objects using the __add_unary function:
   define vector_chs (v)
   {
      return vector_new (-v.x, -v.y, -v.z);
   }
   __add_unary ("-", Vector_Type, &vector_chs, Vector_Type);
A trivial example of the use of the unary minus is V4 = -V2.

It is interesting to consider the extension of the multiplication operator * to Vector_Type. A vector may be multiplied by a scalar to produce another vector. This can happen in two ways as reflected by the following functions:

   define vector_scalar_mul (v, a)
   {
      return vector_new (a*v.x, a*v.y, a*v.z);
   }
   define scalar_vector_mul (a, v)
   {
      return vector_new (a*v.x, a*v.y, a*v.z);
   }
Here a represents the scalar, which can be any object that may be multiplied by a Double_Type, e.g., Int_Type, Float_Type, etc. Instead of using multiple statements involving __add_binary to define the action of Int_Type+Vector_Type, Float_Type+Vector_Type, etc, a single statement using Any_Type to represent a ``wildcard'' type may be used:
   __add_binary ("*", Vector_Type, &vector_scalar_mul, Vector_Type, Any_Type);
   __add_binary ("*", Vector_Type, &scalar_vector_mul, Any_Type, Vector_Type);
There are a couple of natural possibilities for Vector_Type*Vector_Type: The cross-product defined by
   define crossprod (v1, v2)
   {
      return vector_new (v1.y*v2.z-v1.z*v2.y,
                         v1.z*v2.x-v1.x*v2.z,
                         v1.x*v2.y-v1.y*v2.x);
   }
and the dot-product:
   define dotprod (v1, v2)
   {
      return v1.x*v2.x + v1.y*v2.y + v1.z*v2.z;
   }
The binary * operator between two vector types may be defined to be just one of these functions--- it cannot be extended to both. If the dot-product is chosen then one would use
   __add_binary ("*", Double_Type, &dotprod, Vector_Type_Type, Vector_Type);

Just because it is possible to define the action of a binary or unary operator on an user-defined type, it is not always wise to do so. A useful rule of thumb is to ask whether defining a particular operation leads to more readable and maintainable code. For example, simply looking at

   c = a + b;
in isolation one can easily overlook the fact that a function such as vector_add may be getting executed. Moreover, in cases where the action is ambiguous such as Vector_Type*Vector_Type it may not be clear what
   c = a*b;
means unless one knows exactly what choice was made when extending the * operator to the types. For this reason it may be wise to leave Vector_Type*Vector_Type undefined and use ``old-fashioned'' function calls such as
   c = dotprod (a, b);
   d = crossprod (a, b);
to avoid the ambiguity altogether.

Finally, the __add_string function may be used to define the string representation of an object. Examples involving the string representation include:

    message ("The value is " + string (V));
    vmessage ("The result of %S+%S is %S", V1, V1, V1+V2);
    str = "The value of V is $V"$;
For the Vector_Type one might want to use the string represention generated by
   define vector_string (v)
   {
      return sprintf ("(%S,%S,%S)", v.x, v.y, v.z);
   }
   __add_string (Vector_Type, &vector_string);


Next Previous Contents