14.6 An Introduction to Table Indexing Methods

A number of other topics related to tables are somewhat more advanced, and will not be considered here in any more detail than will assist the reader in consulting further references. It should be noted, however, that as a table grows very large, the sequential access method implied by the linked structure becomes a hindrance. If the table is sorted, possible solutions include:

If the maximum size of the table is known ahead of time, it could be implemented as an array to speed access.
In either case, the table could be indexed and a separate array or linked list contain the indices into the table. For instance a table ordered alphabetically could have a separate index table by the first letter of the key so that a search for "Xanadu" would start at the beginning of the "X" entries.
One could go further than this and have a separate linked list for each letter of the alphabet. Conceptually, one could regard the data structure as a linked list of linked lists (or of queues or stacks, if appropriate). This could be done for any appropriate index. For instance the employees of an organization could be organized by department. Each department name provides an index to a linked list of the people in that department.

The maintenance of a table of starting points of separate linked lists is called the indexed sequential access method (ISAM), although this term is most commonly used specifically for the indexing of sequential files.

Sometimes the indexing into a table is done dynamically by performing some computation based on the key. In the case of alphabetically sorted data, this could be an extension of the last two ideas. For instance, if d is the distance in letters of a letter from "A," then the formula

26 * d(first letter) + d(second letter)

Any method of computing directly from the search key at run time an index into a table of data is called hashing; a table indexed in this way is called a hash table; and the function employed for the calculation is called a hash function.

Alphabetically ordered items are not spread evenly through the letters; they are clustered heavily in some parts. For instance, a telephone book may contain many last names "Kim," "Smith," "Wong," and "Friesen," but not very many "Sutcliffe," "Hacker," "Zinzelmeyer," "Szczepanczyk," or "Herod." Moreover, the nature of the clustering would not be the same in every country (not many named "Smith" in China; mostly "Kim" in Korea, and lots of "Sutcliffe" in Yorkshire). Thus, the development of effective and efficient hash functions to index large tables depends very much on the data at hand, and may change over time as the data is maintained. The reader is invited to consult an advanced work on these subjects.

Contents