Part A--Strings

7.2 Communicating in English

The data type String (an ARRAY [0 .. n - 1] OF CHAR) has been mentioned several times, and previous programs have made considerable use of string literals. This data type is neither a part of the Modula-2 language proper nor is it part of the predefined standard operating environment. However strings have a wide variety of uses, and all implementations of the language include a module implementing specific instances of this data type, as well as a number of procedures to act on such arrays.

Even if that were not the case, programs are free to use such arrays for their own purposes, and programmers may devise their own procedures to operate upon them.

Some things are built in. Any literal or constant string can be directly assigned (using the := operator) to an ARRAY [0 .. n - 1] OF CHAR provided that the length of the string (number of characters) being assigned is less than or equal to n. If it is equal, the array is filled entirely; if it is not, then a special string terminator character is automatically appended to the string when it is placed in the array. The latter is done so that programs using these arrays will know when they have encountered the last valid character.

NOTE: In classical versions of Modula-2 the string terminator character was always the null character (CHR (0)). In the ISO standard, the value of this character is implementation defined, but it is always equal to the character literal '' or, "" (i.e. two single of two double quotes with no characters between them). In most, however, it will still likely be CHR (0).

Once such an assignment to an ARRAY OF CHAR is done, a procedure like WriteString is able to write out the array just as it does a literal string. In anticipation of this, the code for WriteString is arranged so that it will terminate the output of characters if it should arrive at the string terminator character before it reaches the end of the array.

Naturally, the corresponding procedure such as ReadString also places a string terminator character after any string that it reads into a longer array. Thus, some string operations are included in the Modula-2 system (language and standard modules) and, as mentioned above there are always more in a utility module designed expressly for the purpose of manipulating this data type. Indeed, the ISO standard for the language mandates that conforming implementations shall supply such a utility module.

Summarizing to this point, it is worth taking note of the following definitions:

A Modula-2 string is an ARRAY [0 .. n - 1] OF CHAR. Because the range must start at zero, they are said to be zero-based. Because an array that is not entirely full has a string terminator character to mark the last-used position, a Modula-2 string is said to be terminated.
When the basic structure of a data type is visible, and specific instances of it have to be instantiated, but it can otherwise be treated abstractly, perhaps because library modules or built-in routines are available for manipulations, it may be termed an implied abstract type.

That is, the whole collection of potential string types taken together can be thought of as constituting an implied abstract type STRING, even though, strictly speaking, Modula-2 does not have an abstract string type per se. One is permitted to define string constants, and to assign string constants and literals to string variables, provided the target being assigned to has at least as many components as the source being assigned from. Moreover, in the ISO standard version of Modula-2, there is a type for the entire collection of string literals.

All string literals are said to be of S-type.

Here are a few examples. Given the following declarations:

CONST
  mesg = "Hello there";
  name = "Fred";
TYPE
  String10 = ARRAY [0..10] OF CHAR;
  String11 = ARRAY [0..11] OF CHAR;
VAR
  string1, string2 : String10;
  string3 : String11;

then, in a program, the following assignments and statements have the indicated effect: (The spaces indicate positions in the string and the symbol ø indicates the string terminator character.)

string1 := mesg;	  (exactly filled)

string2 := name;	  (last part undefined)

WriteString (mesg);
ReadString (string1);  

Suppose the person answered by typing the string "Yes."

string1 would now hold:

Note that the assignment rule mentioned above means that

string3 := "Now is then";

is a valid assignment because the literal being assigned has eleven characters, but that all of

string2 := "Now is then";
string2 := string3;
string3 := string2;

are not valid, the first because the target type is too small, and the other two because the entities are of different types, and the normal rule for array assignment comes into effect.

In addition, zero character literals are regarded as strings according to this definition, and single character literals can be taken as either type CHAR or as type ARRAY[0..0] OF CHAR. The latter can also be defined by writing out their ASCII number in Octal (base eight) notation followed by the symbol C. For more details on number bases, see Chapter eight. On the other hand, constants declared with the CHR are just of the type CHAR, not strings. Thus, if in addition to the last set of declarations, we had:

CONST
  CR = 15C;
  space = " ";
  empty = "";
  LF = CHR (10);

then the following assignments are valid:

string 1 := CR;
string 1 := space;
string 1 := empty;

but

string 1 := LF;

is invalid, because LF is of type CHAR only.

Notice that the assignment to the string variable does not affect any positions in the array beyond those necessary to do the assignment. The characters after the string terminator are no longer of any interest, for even though the history of this particular variable's use does tell us what those characters are, they should be regarded as undefined insofar as the string variable is concerned.

The details just described affect the way that string operations are coded, whether these are imported from modules, or are user-devised.

To see how this is so, consider how a programmer could write some typical string operations. Two fairly easy things one might like to be able to do are:

1. compute the number of active characters in a string, and
2. join two strings together.

The number of characters in a string is called its length.
When two strings are joined end-to-end to make a new string whose length is the sum of the first two this process is called concatenation.

Example:

"HOW TO" is a string of length six and " PROGRAM" has length eight (note the spaces). The concatenation is "HOW TO PROGRAM" and has length fourteen.

When using string literals and string constants only (not variables) ISO standard Modula-2, implements within the language proper a concatenation operator "+" so that one may write:

CONST
  CR = 15C
  LF = 12C;
  DOSLineEnd = CR + LF;
  strConst = "Hi" + " There"
  strConst2 = strConst + DOSLineEnd;

or an assignment such as

  
  strVar := "Hello" + " world"

but may not write:

  string1 := string1 + string2;

or even

CONST
  return = CHR (13);
  strConst = "Hello" + return;

because this last constant can not be used as a string of length one, but is a CHAR.

Likewise, if in addition to the above, one has:

TYPE
  String80 = ARRAY [0..79] OF CHAR;
VAR
  str : String80;

then the assignment

str := strConst + str;

is also illegal, because the concatenation operator cannot be used with strings of a specified type, only with literals (that is, of the S-type.)

Otherwise, one may use + with the same meaning as indicated above for Concat. The function Concat must still be included in a library module for operations on string variables.

What follows is a portion of a library module that could implement an instance data type and these two operations.

The procedure Length works by examining each character in the array passed to it until it either reaches the last index of the array or to a string terminator, whichever comes first. The number of characters checked by the time this loop is exited is the length of the string.

The procedure Concat starts by placing the first string into the result; then, if there is room left, it puts as much of the second one in as possible. The result will either be entirely filled, or will end with the string terminator taken from the end of the second string put into it. If the concatenation of the two strings would contain too many characters to fit into the array being used to hold the result, the extra characters are quietly discarded with no error being generated. One says that it is "silently truncated."

DEFINITION MODULE Strings;

TYPE
  String = ARRAY [0 .. 79] OF CHAR; (* convenience type *)

PROCEDURE Length (str : ARRAY OF CHAR) : CARDINAL;
(* returns the number of characters in a string up to a string terminator, or the end of the array, whichever is less *)

PROCEDURE Concat (str1, str2 : ARRAY OF CHAR;
          VAR result : ARRAY OF CHAR);
(* This procedure concatenates two strings.  It will use as much of the two as possible, silently truncating the result if there is not enough room. *)

END Strings.

IMPLEMENTATION MODULE Strings;

CONST
  terminator = "";

PROCEDURE Length (stringVal: ARRAY OF CHAR): CARDINAL;
  (* Returns the length of stringVal *)
  
VAR
  count : CARDINAL; (* Counting Variable *)
  hiStr : CARDINAL; (* hold high of string for comparisons *)
  
BEGIN
  hiStr := HIGH (stringVal);
  count := 0; 
  WHILE (count <= hiStr) AND (stringVal[count] # terminator)
    DO
      INC(count);
    END;
  RETURN count;
END Length;

PROCEDURE Concat (str1, str2 : ARRAY OF CHAR;
          VAR result : ARRAY OF CHAR);

VAR
  max, rcount, scount : CARDINAL;

BEGIN
  max := HIGH (result);
    (* max is the maximum number of places available in the result*)
  rcount := 0;    (* initialize result string count *)
  WHILE (rcount <= HIGH (str1))
               AND (str1 [rcount] # terminator) 
               AND (rcount <= max)
    DO
      result [rcount] := str1 [rcount];
      INC (rcount)   (* Put in as much of str1 *)
    END ;   (* as will fit *)
  IF rcount <= max   (* room left? *)
    THEN   (* yes, so, reusing last position with terminator *)
      scount := 0;    (* set counter for second string *)
      WHILE (scount <= HIGH (str2))
               AND (str2 [scount] # terminator)
              AND  (rcount <= max)
        DO
          result [rcount] := str2 [scount];  (* and put in as *)
          INC (rcount);    (* much of it as will fit too *)
          INC (scount);
        END; 
    END;   (* if *)
  IF rcount <= max   (* still room left? *)
    THEN
      result [rcount] := terminator; (* put in terminator *)
    END;
END Concat;

END Strings.

Notice that HIGH (str1) and HIGH (str2) do not return a number corresponding to the length of the string. Instead (as always) they produce the highest index used when the actual parameter array is assigned to the open formal parameter array. If one passes a literal string instead of an array variable, then HIGH (str1) would be one less than the length of the string, but if one is passing objects of the type String (above) then eighty places are assigned to the formal parameter. This would be reflected by HIGH (str1), which would therefore be 79 for this data type even if not all eighty CHARs are actually being used (there is a string terminator somewhere before position 79).

Example:

If one had:

  VAR
    str : String;

  PROCEDURE PrintMax (str : ARRAY OF CHAR);

  BEGIN
    WriteCard (HIGH (str), 1)
  END PrintMax;

then

  PrintMax ("HELLO"); would print 4

whereas

  str := "HELLO";
  PrintMax (str);

would print 79, as it is assigned "str", not "HELLO" directly.


Contents