1.7 Data Manipulation Abstractions (Expressions)

Along with the concept of an ADT (data and operations) comes the idea of writing down combinations of data items using some notation for the operation. The notational abstractions used for such operations as, say, addition, depend somewhat on those used to represent the data. For instance the operation eleven plus twenty could be represented as:

(|||||||||||) (||||||||||||||||||||)

This is what children do when they are learning how to make number abstractions and they count with sticks, marbles, pictures of teddy bears or jars of jelly beans.

At a higher level of abstraction, one combines several symbols into one, and could write:

XI + XX,

a notation that, despite its inconveniences, served Europe for many centuries.

Later, the concept of place value, and the idea of using zero as a place holder became accepted. With the adoption of this Arabic system, the modern numeric representation came into being, and along with it came streamlined rules for performing operations--all of which transfers more or less directly into most computing notations. Thus, we write

3 + 5 + 16 (addition)

7 - 2 (subtraction)

-4 * 6 (multiplication)

3.0 / 7.0 (division)

for various numeric data types. Note the use of * and / for multiplication and division, respectively. This practice is all but universal, for most keyboards lack the usual mathematical symbols for these operations.

A combination of data items with various operators that are available for that data type is called an expression.

Performing the operations and extracting a single numeric result is called evaluating the expression.

Thus the evaluation of 3 + 5 + 16 produces 24, of -4 * 6 produces -24, and so on.

Of course, the concept of type attaches to the result of evaluating the expression, as well as to the individual items that make it up. Thus, all the expressions above are evidently of numeric types.

Now consider expressions such as:

15 > 6

3.0 < 2.0, or

-3 = 7

These contain numeric data connected by comparisons rather than arithmetic operations. They produce the values True, False, and False, respectively, and are classified as Boolean expressions. We say:

The type of an expression is the same as the type of the data produced when the expression is evaluated.

Naturally, this applies to expressions with named symbols for constants or variables as well. One often writes formulas, with a variable on the left hand side, and an expression on the right hand side to mean that the expression is to be evaluated and the result (with its type) will have the name given on the left. Examples include:

interest = principal * rate * time

distance = speed * time

The intent in using such formulas is that they stand abstractly for a whole class of possible computations, the actual numeric details for which can be filled in later. Naturally, such facilities are available in most computing notations as well. In such cases, the name also represents a memory location, and one can think of the value as being deposited in that named location for later reference.

Like algebraic expressions, Boolean ones may also be represented by names, and combined to form more complex expressions. For instance, if p and q are Boolean expressions, then:

not p is false whenever p is true, and vice versa,

p and q is true whenever both p and q are true, and

p or q is true whenever either of p or q (or perhaps both) is true.

In forming boolean expressions, "and," "or," and "not" are called connectives.

1.7.1 Precedence

Some expressions are ambiguous unless rules are adopted to make their meaning clear. Thus

3 + 4 * 5

could produce 35 or 23, depending on whether the addition or the multiplication is performed first. To ensure that such problems do not arise, mathematicians adopt a "convention" or set of rules to evaluate otherwise ambiguous expressions.

By this convention, multiplication and division are performed before addition and subtraction, but parentheses can modify this order. Otherwise, evaluation is done left to right. That is, the correct evaluation of the expression above produces 23. Such rules are not followed by many calculators, which evaluate expressions as one enters them. However, computers are more expensive than calculators, and one can reasonably expect that their programming notations can handle the mathematically correct order of operations.

Here are a number of evaluations, with the results shown at right:

	x = 2 + 6 / 3 	 	4 ==> x
	x = 3 - 6 * (7 + 3)	-57 ==> x
	x = 3 - 4 + 6 * 7	 	41 ==> x

When expressions contain booleans, numeric comparisons have the lowest priority, the or connective has the precedence of addition, the and connective has the precedence of multiplication, and the not has a higher precedence than either. Again, parentheses can modify this order.

	2 + 3 <= 5   	true
	(1 < 2) and (-4 < 7)	true
	(2 >= 5) or (8 < 6) 	false
	not (1 = 1)  	false
	(4 > 2 + 2) 	false
	5 < 1 and 3 < 4	cannot evaluate. 1 and 3 makes no sense.

1.7.2 Expression compatibility

As the last example illustrates, it makes little sense to mix data items of different types in the same expression. (What operator could be used? What type would result?) Thus 4.0 < True or 3 - False are rather obvious errors.

On the other hand, the numeric operations are defined (with essentially the same meaning) for several types. For example, one can write whole number or real addition expressions. This leads to some interesting difficulties when writing mixed expressions (containing more than one type.)

An expression like -2 + 5 can be evaluated mathematically to 3 without giving any thought to such issues, but in a computing machine things are not so simple. Some notations take a very strict view of mixed expressions. Because there is one signed whole number in the expression, they would assume that the 5 be taken from the signed whole number type rather than the unsigned whole number type. The result would be of the signed whole number type. On the other hand, if there is no context it is impossible to tell whether the underlying nature of symbols like 5 arise indeed signed or unsigned--they can be written in either type of expression.

When two ADTs share a common range and operation and instances of their symbols can be used together in a single expression, they are called expression compatible (over the common range). Otherwise, they are expression incompatible.

Expressions like 4.5 + 3 are also easy to handle abstractly. This evaluates to 7.5 and so is a real expression. The fact that the 3 is converted into 3.0 (from an unsigned whole number or signed whole number to a real) is often ignored--outside the computer. Within the computing environment however, this conversion cannot be ignored, for the two data types may well be stored in very different ways and therefore be expression incompatible. In some notations, this conversion is performed automatically, making these types (at least appear to be) expression compatible. In others, the user is responsible to do conversions when data is not expression compatible. This particular conversion is called "floating" (for converting to floating point) and the expression may be written as 4.5 + float (3).

Similar explicit conversions may be required if it is known that data is of one type and the result is of another. For instance:

card (-3 - -7)

might be used to produce an unsigned whole number result, and

int (10 - 5)

might indicate this result is of signed whole number type.

It is also worth observing that although certain operators (+,-,*, /) work on several types, they mean slightly different things for each. (Internally to the machine, they could mean very different things. )

An operator that is defined for expressions of more than one type is said to be overloaded.

Contents