13.2 Introduction to Sorting

In the last section, the efficient routine for finding data in an array was based on the premise that the data being searched was already sorted. Indeed, computers are used so extensively to process data collections that in many installations, a great deal of their time is spent maintaining that data in sorted order in the first place. It turns out that methods of searching a sorted list have much in common with methods of achieving that sorted condition.

In order to concentrate on sorting abstractions without having to be concerned with the type of data being sorted, most of the code presented in the rest of this chapter will be designed to sort only a single kind of data, namely arrays of cardinals. Only minor modifications in a few places are necessary to use the code presented in the next few sections for other kinds of data. The sorts presented here can be made generic enough to sort any kind of data, but that step requires some programming abstractions that will not be taken until chapter twelve, so the topic will be revisited in chapter thirteen with that aspect in view.

The basic premise behind sorting an array is that its elements start out in some (presumably) random order and need to be arranged from lowest to highest. If the number of items to be sorted is small, a human reader may be able to tell at a glance what the correct order ought to be. If there are a large number of items, a more systematic approach is required. To do this, it is necessary to think about what it means for an array to be sorted. It is easy to see that the list

1, 5, 6, 19, 23, 45, 67, 98, 124, 401

is sorted, whereas the list

4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345

is not. The property that makes the second one "not sorted" is that there are adjacent elements that are out of order. The first item is greater than the second instead of less, and likewise the third is greater than the fourth and so on.

Once this observation is made, it is not very hard to devise a sort that proceeds by examining adjacent elements to see if they are in order, and swapping them if they are not.

13.2.1 The Bubble Sort

This most elementary sorting method proceeds by scanning through the elements a pair at a time, and swapping any adjacent pairs it finds to be out of order. Thus, for the list in the example above, the first two items are swapped, then the (new) second item is compared to the third (and not swapped,) the third is compared to the fourth, and so on to the end. The list would be altered as follows (comparisons are emphasized.)


4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345
1, 4, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345
1, 4, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345
1, 4, 34, 90, 100, 45, 23, 82, 11, 0, 600, 345
1, 4, 34, 90, 100, 45, 23, 82, 11, 0, 600, 345
1, 4, 34, 90, 45, 100, 23, 82, 11, 0, 600, 345
1, 4, 34, 90, 45, 23, 100, 82, 11, 0, 600, 345
1, 4, 34, 90, 45, 23, 82, 100, 11, 0, 600, 345
1, 4, 34, 90, 45, 23, 82, 11, 100, 0, 600, 345
1, 4, 34, 90, 45, 23, 82, 11, 0, 100, 600, 345
1, 4, 34, 90, 45, 23, 82, 11, 0, 100, 600, 345
1, 4, 34, 90, 45, 23, 82, 11, 0, 100, 345, 600

Unfortunately, the list is not yet sorted, as there are still several places where adjacent items are out of order. The number 0, for instance, which should be first, is in the ninth slot. Notice, however, that the largest item worked its way to the top position, and indeed, this algorithm will always force this to happen. Thus, if at this point the same strategy is continued, it is only the first n-1 items that need to be scanned. On the second pass, the second largest item will move to its correct position, and on the third pass (stopping at item n-3) the third largest will be in place. It is this gradual percolation, or bubbling of the larger items to the top end that gives this sorting technique its name.

There are two ways in which the sort can terminate with everything in the right order. It could complete by reaching the n-1st pass and placing the second smallest item in its correct position. Alternately, it could find on some earlier pass that nothing needs to be swapped. That is, all adjacent pairs are already in the correct order. In this case, there is no need to go on to subsequent passes, for the sort is complete already. If the list started in sorted order, this would happen on the very first pass. If it started in reverse order, it would not happen until the last one.

The code that follows embodies one way of expressing this algorithm. It requires a separate swap procedure, which is included here for the sake of completeness, and for a reminder of how swapping is done. Here, the type cardinal is used, but other types could also be employed. Note that the array may not be full, and unused items at the end are not to be included in the sort, so a cardinal parameter must be passed to inform the bubble sort how many items are active. This procedure does no error checking on the passed parameter; it is a precondition that the number passed be valid. Also note that the array being sorted is numbered [1 .. n], [0 .. n - 1], or in general [a .. a + (n - 1)] exterior to the bubble sort procedure, but only [0 .. n - 1] inside it. The procedure will therefore be informed where to begin sorting and where to stop , both as an offset from the first index. These will be passed in parameters lBound and uBound respectively, and there will be the least confusion, of course, if the arrays are numbered by the caller as [0 .. last].

PROCEDURE Swap (VAR first, second : CARDINAL);
 
 (* Exchange values in first and second. *)

  VAR
    temp : CARDINAL;
    
  BEGIN
    temp := first;
    first := second;
    second := temp;
  END Swap;

PROCEDURE BubbleSort (VAR source : ARRAY OF CARDINAL;
      lBound, uBound : CARDINAL);

   (* sorts beginning at item # lBound and ending at item #uBound, as numbered inside the procedure [0..HIGH], not as on the outside. *)

(* Pre: uBound <= HIGH (source)
   Post : source is bubble sorted *)
   
VAR
  switch, count : CARDINAL;
  
BEGIN
  IF uBound > lBound (* otherwise, there is nothing to do *)
    THEN
      REPEAT (* start a new pass here *)
        switch := 0; (* reset number of swaps for this pass *)
        FOR count := lBound TO (uBound - 1)
        (* scan only as far as necessary *)
          DO 
            IF source [count] > source [count + 1]
            (* compare adjacent pairs *)
              THEN
                Swap (source [count], source [count + 1]);
                INC (switch);
              END; (* if *)
          END; (* for *)
        DEC (uBound); (* for the next pass, examine one less item *)
      UNTIL (switch = 0) OR (uBound = 0);
      (* till no pairs swapped, or at first *)
    END; (* if *)
END BubbleSort;

When this sort was tested, statements were included to print the array after each comparison, and to print the number of swaps on each pass. The original data is given first, and the items that are compared are emphasized.

  234   77    0  113  404   94  900  113   15  300   13  135

Pass number  1
   77  234    0  113  404   94  900  113   15  300   13  135
   77    0  234  113  404   94  900  113   15  300   13  135
   77    0  113  234  404   94  900  113   15  300   13  135
   77    0  113  234  404   94  900  113   15  300   13  135
   77    0  113  234   94  404  900  113   15  300   13  135
   77    0  113  234   94  404  900  113   15  300   13  135
   77    0  113  234   94  404  113  900   15  300   13  135
   77    0  113  234   94  404  113   15  900  300   13  135
   77    0  113  234   94  404  113   15  300  900   13  135
   77    0  113  234   94  404  113   15  300   13  900  135
   77    0  113  234   94  404  113   15  300   13  135  900
Swaps on this pass = 9

Pass number  2
    0   77  113  234   94  404  113   15  300   13  135  900
    0   77  113  234   94  404  113   15  300   13  135  900
    0   77  113  234   94  404  113   15  300   13  135  900
    0   77  113   94  234  404  113   15  300   13  135  900
    0   77  113   94  234  404  113   15  300   13  135  900
    0   77  113   94  234  113  404   15  300   13  135  900
    0   77  113   94  234  113   15  404  300   13  135  900
    0   77  113   94  234  113   15  300  404   13  135  900
    0   77  113   94  234  113   15  300   13  404  135  900
    0   77  113   94  234  113   15  300   13  135  404  900
Swaps on this pass = 7

Pass number  3
    0   77  113   94  234  113   15  300   13  135  404  900
    0   77  113   94  234  113   15  300   13  135  404  900
    0   77   94  113  234  113   15  300   13  135  404  900
    0   77   94  113  234  113   15  300   13  135  404  900
    0   77   94  113  113  234   15  300   13  135  404  900
    0   77   94  113  113   15  234  300   13  135  404  900
    0   77   94  113  113   15  234  300   13  135  404  900
    0   77   94  113  113   15  234   13  300  135  404  900
    0   77   94  113  113   15  234   13  135  300  404  900
Swaps on this pass = 5

Pass number  4
    0   77   94  113  113   15  234   13  135  300  404  900
    0   77   94  113  113   15  234   13  135  300  404  900
    0   77   94  113  113   15  234   13  135  300  404  900
    0   77   94  113  113   15  234   13  135  300  404  900
    0   77   94  113   15  113  234   13  135  300  404  900
    0   77   94  113   15  113  234   13  135  300  404  900
    0   77   94  113   15  113   13  234  135  300  404  900
    0   77   94  113   15  113   13  135  234  300  404  900
Swaps on this pass = 3

Pass number  5
    0   77   94  113   15  113   13  135  234  300  404  900
    0   77   94  113   15  113   13  135  234  300  404  900
    0   77   94  113   15  113   13  135  234  300  404  900
    0   77   94   15  113  113   13  135  234  300  404  900
    0   77   94   15  113  113   13  135  234  300  404  900
    0   77   94   15  113   13  113  135  234  300  404  900
    0   77   94   15  113   13  113  135  234  300  404  900
Swaps on this pass = 2

Pass number  6
    0   77   94   15  113   13  113  135  234  300  404  900
    0   77   94   15  113   13  113  135  234  300  404  900
    0   77   15   94  113   13  113  135  234  300  404  900
    0   77   15   94  113   13  113  135  234  300  404  900
    0   77   15   94   13  113  113  135  234  300  404  900
    0   77   15   94   13  113  113  135  234  300  404  900
Swaps on this pass = 2

Pass number  7
    0   77   15   94   13  113  113  135  234  300  404  900
    0   15   77   94   13  113  113  135  234  300  404  900
    0   15   77   94   13  113  113  135  234  300  404  900
    0   15   77   13   94  113  113  135  234  300  404  900
    0   15   77   13   94  113  113  135  234  300  404  900
Swaps on this pass = 2

Pass number  8
    0   15   77   13   94  113  113  135  234  300  404  900
    0   15   77   13   94  113  113  135  234  300  404  900
    0   15   13   77   94  113  113  135  234  300  404  900
    0   15   13   77   94  113  113  135  234  300  404  900
Swaps on this pass = 1

Pass number  9
    0   15   13   77   94  113  113  135  234  300  404  900
    0   13   15   77   94  113  113  135  234  300  404  900
    0   13   15   77   94  113  113  135  234  300  404  900
Swaps on this pass = 1

Pass number 10
    0   13   15   77   94  113  113  135  234  300  404  900
    0   13   15   77   94  113  113  135  234  300  404  900
Swaps on this pass = 0

13.2.2 The Selection Sort

It is not difficult to see that some additional efficiency can be obtained for the bubble sort. It uses many swaps to get the largest item into its correct position on each pass, and some of these are wasted. If the scan is modified so that it simply finds the largest item in the range being scanned and no interchanges are done until the scan is finished, all the intermediate swaps can be eliminated. Then, when the pass is complete, the largest item can be swapped into the top position for that pass. For instance, starting with the list

4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345

one would find the number 600 to be largest on the first pass (as with the bubble sort,) but do only the swap from its present position to the last spot for the pass. Successive passes would produce the lists:

4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 345, 600
4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 345, 600
4, 1, 90, 34, 0, 45, 23, 82, 11, 100, 345, 600
4, 1, 11, 34, 0, 45, 23, 82, 90, 100, 345, 600
4, 1, 11, 34, 0, 45, 23, 82, 90, 100, 345, 600
4, 1, 11, 34, 0, 23, 45, 82, 90, 100, 345, 600
4, 1, 11, 23, 0, 34, 45, 82, 90, 100, 345, 600
4, 1, 11, 0, 23, 34, 45, 82, 90, 100, 345, 600
4, 1, 0, 11, 23, 34, 45, 82, 90, 100, 345, 600
0, 1, 4, 11, 23, 34, 45, 82, 90, 100, 345, 600
0, 1, 4, 11, 23, 34, 45, 82, 90, 100, 345, 600

In each pass, the number successfully placed is underlined. Here is the code to express these ideas:

PROCEDURE SelectSort (VAR source : ARRAY OF CARDINAL;
        lBound, uBound : CARDINAL);
(* Pre: uBound <= HIGH (source) and uBound > 0
    Post : source is selection sorted *)

(* sorts beginning at item # lBound and ending at item #uBound, as numbered inside the procedure [0..HIGH], not as on the outside. *)

VAR
  count, iLarge : CARDINAL;
  
BEGIN
  IF uBound > lBound (* otherwise, there is nothing to do *)
    THEN
      REPEAT  (* start a new pass here *)
        iLarge := uBound;
        (* assume last index is of the largest one *)
        FOR count := uBound TO lBound BY -1
        (* scan from current end of list *)
          DO
            IF source [count] > source [iLarge]
            (* look for anything larger *)
              THEN
                iLarge := count;
              END; (* if *)
          END; (* for *)
          IF iLarge # uBound
          (* if equal, its already in right spot *)
            THEN
              Swap (source [uBound], source [iLarge]);
            END; (* if *)
          DEC (uBound);  (* for next pass examine one less item *)
        UNTIL uBound = lBound;
    END; (* if *)
END SelectSort;

Notice that a small change was made in the structure of the sort. The scan for the largest on each pass is conducted starting at the last index rather than at the low end. This is done to maximize the efficiency of the sort in cases where the list is already sorted. The index iLarge would in such cases hold the position of the largest item right from the start of the scan; no changes of index would take place, and there would be no swap. Of course, if the list were reverse sorted to begin with, this choice would reduce efficiency, but that possibility is less likely than that it may be sorted prior to beginning.

This procedure was run with the same test data as the bubble sort and with suitable print statements to monitor the progress of the sort. It produced the output below, with the original data first. The item correctly placed on each pass is emphasized.

  234   77    0  113  404   94  900  113   15  300   13  135

Pass number  1
  234   77    0  113  404   94  135  113   15  300   13  900
Pass number  2
  234   77    0  113   13   94  135  113   15  300  404  900
Pass number  3
  234   77    0  113   13   94  135  113   15  300  404  900
Pass number  4
   15   77    0  113   13   94  135  113  234  300  404  900
Pass number  5
   15   77    0  113   13   94  113  135  234  300  404  900
Pass number  6
   15   77    0  113   13   94  113  135  234  300  404  900
Pass number  7
   15   77    0   94   13  113  113  135  234  300  404  900
Pass number  8
   15   77    0   13   94  113  113  135  234  300  404  900
Pass number  9
   15   13    0   77   94  113  113  135  234  300  404  900
Pass number 10
    0   13   15   77   94  113  113  135  234  300  404  900
Pass number 11
    0   13   15   77   94  113  113  135  234  300  404  900

Contents