12.1 Pointers

Pointers were briefly mentioned in section 8.3.1 in the discussion of the contents of the pseudo-module SYSTEM, which uses the definition:


Because this definition is phrased as a specific instance of the pointer type, it should be apparent that a pointer may be something more general than the address of a LOC (smallest storage location). In fact a pointer may hold the location of any data. This is only a slight conceptual generalization, however, as the pointer will still hold the address of a LOC--the first unit of storage belonging to the data in question.

A pointer or reference variable identifies a memory location that holds the address of some other entity. It points to that other entity.

Although an addressible location might not be called a LOC in other programming notations, this definition is a general one, and is not specific to Modula-2.

12.1.1 Pointer Variables

Of course in Modula-2, identifiers for pointer variables have to be declared using the usual syntax, for instance, for a type whose entities will point to integers:

  iPoint : IntPoint;
  int : INTEGER;

Following these declarations, assignments could be made such as:

  iPoint := SYSTEM.ADR (int);

Conceptually, items of the type IntPoint point to an entire integer, whatever number of memory locations an integer occupies. On the other hand, there is a sense in which all pointers are the same type (ADDRESS) even though conceptually each pointer type is different, depending on the type of data they point to. Thus the Modula-2 compatibility rule is:

Items of any pointer type are assignment compatible to the type SYSTEM.ADDRESS. Two different pointer types are not assignment compatible with each other, but can be CAST to another if required.

To illustrate, if one also had

  RealPoint = POINTER TO REAL;
  rPoint, rPoint2 : RealPoint;
  re : REAL;

then the following are all legal:

  adr := iPoint;
  adr := rPoint;
  rPoint := SYSTEM.CAST (RealPoint, iPoint); (* most have a good reason for this *)

but the following are all illegal because the types pointed to are incompatible, and therefore so are the pointer types:

  iPoint := rPoint;
  rPoint := iPoint;

Pointer types may, as in the examples, point to numeric entities, but are not themselves numeric, and therefore none of the usual numeric operations (+ - * /) can be performed on variables of these types. Two pointers of the same type can be compared to one another using either "=" or "#" but not with "<" or any other comparison operators. Thus,

  IF rPoint = rPoint1

is legal, but

  IF rPoint < iPoint

is not allowed.

12.1.2 Pointer References

It is worth observing that the memory pointed to by a pointer type has a type of its own, but not a name of its own; its name may be regarded as anonymous. Actual references to the memory pointed to are made by using the pointer name, followed by the symbol "^."

Using a pointer to access the memory to which it points is called dereferencing the pointer.

In the cases shown, one could initialize the contents of a memory location pointed to in any of the following ways (among others):

  iPoint^ := int;
  iPoint^ := -5;
  rPoint^ := 3.24E-7;
  TextIO.ReadReal (rPoint2^);

NOTES: 1. On some older systems, the ^ or caret character was written as the up-arrow character , but on most, these are two distinct characters. Where both exist, the one desired here is the caret.

2. The name caret is often shortened to hat and one then pronounces point^ as point-hat.

3. Recall that an ADDRESS is a POINTER TO LOC. Thus, if Ad is of type ADDRESS, then Ad^ is of type LOC.

It is easiest to remember the meaning of point^ if one thinks of it as "the thing pointed to by point". It must be kept very clear that point and point^ are two entirely different entities--the first is the name of the pointer variable whose contents are the number or address of a memory location, and the second is the name of the entity situated at that location. So, when the point is declared, only enough memory for the pointer is set aside. The space for point^ must be obtained separately, either by declaring an entity of the type that can be pointed to, or by executing code that allocates memory at run time (see section 12.5).

The Modula-2 symbol "^" when affixed to an identifier is called the dereferencing operator.

To further illustrate by a map or picture of memory, suppose a program contains the declarations and code fragment:

  sPoint = POINTER TO Student;
  Student =
      name : ARRAY [0..80] OF CHAR;
      number : CARDINAL;

  st : sPoint;
  sylvia : Student;

  st := SYSTEM.ADR (sylvia);
  st^.name := "Sylvia Stockforth";
  st^.number := 830924;

Note, by the way, that st^ is a record, so its field names are referred to in the usual way; one could also use

WITH st^
    name := "Sylvia Stockforth";
    number := 830924;

At this juncture, one could envision the memory contents looking as in figure 12.1 (the address A05B has been chosen arbitrarily).

At this point, one might legitimately ask why a programmer should go to this trouble when the variable sylvia can be more easily referred to directly than via a pointer whose value consists of its address. The answer is that not all situations are as simple as the one used here to illustrate the basic ideas; there are others in which pointers are quite useful, or even necessary.

12.1.3 The value NIL

Sometimes it is useful to initialize pointer variables to a "safe" value, that is, to a value that does not actually point anywhere. This special value has the standard identifier NIL. Just as declaring a Modula-2 numeric variable does not set its value (say, to 0) without an explicit initialization statement in the code, declaring a pointer variable does not set its value to NIL--a specific value--but to an indeterminate value. A not-intentionally-initialized variable has whatever value found in memory when that memory is allocated, and that could be anything.

The Modula-2 standard identifier NIL has an anonymous type called the nil-type, and all pointer types are compatible with the nil-type.

This rule means that a variable of any pointer type, including the type ADDRESS, can be given the value NIL.

NOTES: 1. It causes a run time error (raises an exception) to make a reference to NIL^. Thus, a reference to point^ when point happens to equal NIL, will always cause an error.

2. Some implementations have a compiler option (command line and/or pragma) to force automatic initialization of numeric variables to zero and pointers to NIL. This should not be relied upon, as such code would not be portable.

The short-circuited Boolean expression evaluation feature of Modula-2 can come in handy to prevent such erroneous references as this. Rather than writing

IF p^ = theInterestingValue

and taking the chance that the pointer might be NIL, one should always write:

IF (p # NIL) AND (p^ = theInterestingValue)

so that if in fact point does equal NIL, the right side of the expression will not be evaluated at all, neatly avoiding the potential error. The potentially dangerous evaluation has been guarded by the prior boolean condition that prevents the problematic evaluation from taking place if it would be erroneous. This correction from the Pascal rule for Modula-2 of the logical rules for evaluating Boolean expressions was designed primarily with this very situation in view.