8.3 High Level Access to Low Level Facilities in Modula-2

In common with the implementations of most other languages, all versions of Modula-2 use words as units of data and one memory location of this size is employed to store a CARDINAL. One of them is usually used to store an INTEGER as well, just by interpreting any binary number with a "1" in the highest bit place as a negative and with a "0" in that location as a positive. Thus the sixteen bit binary number

1111 1111 1111 1111

interpreted as a CARDINAL is typically 65535, but as an INTEGER is typically a representation of -1 (subtracting one from zero changes all the bits to ones).

These details are not usually important to the programmer working at the high level, but when they are, it becomes necessary to access such low level features from the high level platform Modula-2 affords. For such instances:

A Modula-2 data storage unit has the type LOC, and a Modula-2 storage location has the type ADDRESS.

Because the implementation of these two data types must always imply some knowledge of the machine on which they are implemented, neither is included in the language proper. Rather, along with certain other low-level constructs, these items are segregated from the language and placed into the module SYSTEM, from whence they must be imported into any other module. This means that the occurrence of the line

FROM SYSTEM IMPORT
  LOC, ADDRESS, <anything else>;

immediately marks the module containing this import as low-level dependent, and therefore probably not portable to another environment.

Strictly speaking, SYSTEM is not a separate library module at all, but a segregated part of the compiler. By convention, its name and contents are all uppercase, but, SYSTEM is not a standard identifier, and neither are LOC or ADDRESS.

The Module SYSTEM, and any other low-level dependent module that is a segregated part of the compiler and not a separate library module is called a system module or a pseudo-module.

8.3.1 The Module SYSTEM

While not actually a separate library module, SYSTEM behaves as though it had the following definition module shown below.

NOTE: Some of the contents of SYSTEM have been omitted and will be discussed later in the text. The meaning of the standard identifier POINTER can be found in chapter 12, and its use will not be detailed here. Some versions of SYSTEM may have additional items necessary for the implementation at hand.

DEFINITION MODULE SYSTEM;

CONST
  BITSPERLOC    = <implementation-defined constant> ;
  LOCSPERWORD   = <implementation-defined constant> ;

TYPE
  LOC; 
  ADDRESS = POINTER TO LOC; 
  WORD = ARRAY [0 .. LOCSPERWORD-1] OF LOC;

  (* BYTE and LOCSPERBYTE are provided if appropriate for machine *)
  
CONST
  LOCSPERBYTE = <implementation-defined constant> ;

TYPE
  BYTE = ARRAY [0 .. LOCSPERBYTE-1] OF LOC;

PROCEDURE ADDADR (addr: ADDRESS; offset: CARDINAL): ADDRESS;
  (* Returns address given by (addr + offset), or may raise an exception if this address is not valid. *)

PROCEDURE SUBADR (addr: ADDRESS; offset: CARDINAL):;
  (* Returns address given by (addr - offset), or may raise an exception if this address is not valid. *)

PROCEDURE DIFADR (addr1, addr2: ADDRESS): INTEGER;
  (* Returns the difference between addresses (addr1 - addr2), or may raise an exception if the arguments are invalid or address space is non-contiguous. *)

PROCEDURE MAKEADR (val: <some type>; ... ): ADDRESS;
  (* Returns an address constructed from a list of values whose types are implementation-defined, or may raise an exception if this address is not valid. *)

PROCEDURE ADR (VAR v: <anytype>): ADDRESS;
  (* Returns the address of variable v. *)

PROCEDURE CAST (<targettype>; val: <anytype>): <targettype>;
  (* CAST is a type transfer function.  Given the expression denoted by val, it returns a value of the type <targettype>.  An invalid value for the target value or a physical address alignment problem may raise an exception. *)

END SYSTEM.

The procedures ADDADR, SUBADR, and DIFADR, are provided to allow arithmetic to be performed on items of the abstract data type ADDRESS. The details of MAKEADR vary from one machine to another. This procedure is intended to allow for the construction of a valid address from some other type of data. The parameters will in most implementations be one or more CARDINALs.

ADR is intended to allow a program to discover the address of one of its own variables. When a module is loaded into memory, the variable declarations are resolved into addresses, and all references to them within the actual code from that point on are to these addresses. ADR returns, in an item of the type ADDRESS the machine location of one of these variables. One might write a fragment such as:

FROM SYSTEM IMPORT
  WORD, ADDRESS, ADR;
  
VAR
  theWord : WORD;
  theAdr : ADDRESS;
  card : CARDINAL;
 
BEGIN
  theAdr := ADR (card);

However, the useful purposes to which this can be put are not described in detail until Chapter 12.

CAST plays a role related to, but different from the standard procedure VAL. When it is necessary to safely convert a value of one type to a value of another type (so as to construct an expression) VAL is always preferable. When, on the other hand, it is desired to forcibly re-interpret the bit pattern of an item of one type as though it were of another type without any conversion, CAST is used instead. Clearly, it is necessary for the programmer to know how the two data types are represented at the low level (bit pattern) or the result of the CAST operation is not usable. Thus, if one has

VAR
  int : INTEGER;
  card : CARDINAL;

then, both

  int := int + VAL (INTEGER, card);

and

  
  int := int + CAST (INTEGER, card);

are syntactically valid, but the latter assumes that the storage bit pattern of the INTEGER has some particular meaning when interpreted as a CARDINAL. This may or may not be true. If, for instance int were negative, CASTing it into a positive valued be meaningful only if the programmer also knew how many bits were used to store the INTEGER and how to interpret the sign bit after the CAST.

Type changes made using VAL are called safe conversions.
Type changes made using CAST are called unsafe conversions.

These low level facilities are detailed at this point not because the student is expected to make great use of them yet, but to illustrate that Modula-2 can be used at the low level. This makes it a language in which operating systems and other software that uses intimate knowledge of the machine can be written.

8.3.2 Variables at Fixed Addresses

Modula-2 provides the ability to declare a variable to reside at a fixed address (or, more accurately, to assign a variable name to the contents of the machine at a particular address). This is done by giving a constant in brackets after the variable at the time it is declared.

VAR
  flag [768] : INTEGER;
  bottom [0] : CARDINAL;
  somewhere [16238] : CHAR

The variable can be of any type and will start at the specified address. Its space extends for the number of storage locations normally taken by an integer, cardinal, char, etc. This particular facility was not required by Wirth's definition of Modula-2, but his suggestion that it be provided was quite strong and most implementations had it. It is required to be provided in ISO standard Modula-2.

NOTES: 1. This is a machine-dependent facility, and code written to take advantage of it is not portable to another system.

2. For this reason, this syntax cannot be used in ISO standard Modula-2 unless the identifier MAKEADR has been imported from SYSTEM even if MAKEADR is not explicitly used.

3. In some operating systems, user programs are not allowed to have low level access to addresses and some of these capabilities will not be present in any notation available to the programmer.

One use of this facility for system programmers is to access single memory locations (LOCs). Some such memory locations may serve the role of hardware switches on I/O locations (among many other uses). Referencing one memory location (LOC) might control turning on, say, the high resolution graphics mode for the machine, and the one just before it might turn it off again. Usually, any reference at all to such a "switch" will cause it to act. An assignment to the variable declared to be there will do quite nicely.

Clearly, one must consult the manual of the computer before accessing the contents of specific addresses, as very undesirable side effects can easily (erasing a disk?) be caused if one acts without due care.

8.3.3 Hexadecimal and Octal Notation in Modula-2

If one finds it useful, Modula-2 allows one to declare constants in Hexadecimal, provided they begin with a number and are followed by the letter H. It will also allow one to declare them in Octal, by following them with the letter B. As indicated briefly in Chapter 7, single character constants compatible with CHAR or the implied string type can be given by the ordinal value in Octal, followed by a C. Here are some examples:

CONST
  a = 0A5H;    (* can't start with letter *)
  b = 651B;
  c = 177777B;
  EOL = 15C;  (* common end of line character *)
  d = 789H ;   (* starts with number *)
VAR
  somewhere [0FFFFH] : CARDINAL;
  
PROCEDURE Length (str : ARRAY OF CHAR) : CARDINAL;

VAR
  count : CARDINAL;

BEGIN
  count := 0;
  WHILE (count <= HIGH (str)) AND (str [count] # 0C)
    DO
      INC (count)
    END;
  RETURN count;
END Length;

Note the handy use of 0C instead of CHR(0) in this version of the procedure Length for a system where it was known that the string terminator was the null character. If it were not known, the more portable character literal "" is preferable. This only matters because the one form takes six keystrokes to type, and the other only takes two.

NOTES: 1. Only the numbers 0 through 12710 define standard characters according to the ISO sequence that underlies Modula-2. Since a LOC (usually byte) is used to store such a character, codes through CHR (255) or 377C (at least) are valid. However, what such a "character" would look like if output to the screen or printer is very much hardware dependent. On some machines it could be a regular ISO character in black/white inverse, and it others is could be a special graphics symbol, a Greek letter, accented character, or something else. There is no Modula-2 standard for characters in the range 127 .. ORD ( MAX (CHAR)).

2. Many machines are now using two byte character coding called Unicode, in order to code languages such as Chinese that have many thousands of characters. In such machines, when the actual language employed is based on Roman script and only 128 characters are needed, the most significant byte of the character is usually set to zero and ignored.

Hexadecimal numbers are convenient for expressing addresses as well as data. Small computers once had sixteen bits with which to express an address, these could range from zero through hexadecimal 0FFFFH (6553510).

NOTE: The number of bits available to the computer to make addresses is independent of the number of data bits. While a computer with eight data bits generally had 16 address lines, a sixteen bit computer may have had sixteen, twenty, or twenty-four address lines. The maximum amount of directly addressable memory in these three cases is therefore 64K, 1M and 16M bytes respectively. A thirty-two data bit computer might have many more address lines.

And now, a little example. Here is a procedure that takes advantage of the fact that on its target machine, there is a fixed location in memory for keyboard data coming in. The keyboard location in this machine is 0C000H and the value at that location is less than 127 if no key has been pressed since the last time any reference was made to the location 0C010H. Whenever the latter location is accessed, the most significant bit in 0C000H is set back to zero, leaving a number below 128.

This procedure waits for the user to press a key before going on, hits the keyboard strobe (0C010H) to reset the key location for the next routine checking this, and then exits with the character value of the key pressed in the type CHAR.

PROCEDURE Keypress ( ):CHAR;
VAR
  Keyboard [0C000H] : CHAR ; (* single loc in this version *)
  KbdStrobe [0C010H] :CHAR;
BEGIN
  REPEAT
  UNTIL Keyboard > CHR (127); (* A keypress--high bit is set *)
  KbdStrobe := CHR (0);  (* reset *)
  RETURN Keyboard;
  (* High bit only is stripped to zero by strobe reset, so in correct ISO range *)
END Keypress;

As can be seen, writing a Keypress function procedure on this level always involves an intimate knowledge of the specific workings of the target hardware device. However, with such knowledge, suitable low-level procedures can be coded directly in Modula-2, without resorting to separate machine language routines. It should, in fact, be possible to modify this particular example for specific hard-wired memory locations on many computers, provided their function is well-documented. Clearly, such code cannot be ported to another system than the original target.


Contents