The Linear Hash Filing System

Published By	Date	Version	Knowledge Level	Keywords
Revelation Technologies	31 AUG 1989	1.1X	INTERMEDIATE	SYSTEMS, FRAMESIZE, LINEARHASH, MODULO, OVERFLOW, RECORDCOUNT, RESIZE, THRESHOLD, SIZELOCK

This presentation details features and functions of the Linear Hash filing system used by Revelation Technologies' Advanced Revelation data base product. The discussion will include advantages that Linear Hash offers over indexed filing systems and traditional hashed filing systems, as well as the basics of Linear Hash file structure and function.

Typical data base filing systems implement a fixed length structure. When a new file is created, it is necessary to determine the exact number of fields that will comprise a record, and their corresponding length. Data that exceed the specified field lengths are truncated. Empty fields, and those fields not requiring the full specified length, are padded, wasting storage space. Since the file uses a fixed length structure, any changes to this structure require that the file be recreated.

These filing systems typically rely upon a primary index for record retrieval. A separate index file must be searched to locate a records position in the file storing the records. For any record read, there will always be at least one disk access for the index file, and another to retrieve the record. This presents a formidable performance problem as the size of the file grows.

Records in an indexed file must be of a fixed length. Field lengths and record structure must be determined at the time of file creation. This requires that some data be truncated and other data be padded with spaces in order to preserve the integrity of a file. Variable length records in an indexed file would require that the entire file be reorganized and the index file rebuilt any time a record was changed.

Last, indexed filing systems are not efficient in a network environment. Indexed filing systems cannot support record locking in the index file. In order to protect a given record in use, it is often necessary to lock an entire branch of the index file. This can make hundreds of records inaccessible to other users on the network.

Hashed filing systems provide a partial solution. Hashed filing systems locate records based upon computations performed against a unique identifier in the record known as the "primary key." This calculation process is known as "hashing". Hashing eliminates the need to look up the records location in an index file if the key is known.

Hashed filing systems divide the file into logical buckets. The key of each record is passed through a hashing algorithm. The hashing algorithm is a program that manipulates the record key in a predefined manner to determine to which bucket the record belongs. Hashing algorithms depend on both the key and the number of buckets in the file.

Hashing is very efficient, as once the bucket is determined, only those records in that bucket need be searched for a key match. Since the location of the record is computed rather than referenced in an index file, the record access time is no longer bound to the file size.

Hashed filing systems make variable length records possible. As a record's length is changed, only its bucket must be rewritten. There is no index file that needs to be maintained.

Hashed filing systems can be very efficient on a network. Since the location of a record is calculated, there is no need to lock an index file. A lock on the bucket to which a record hashes preserves the integrity of the record, while leaving the remainder of the file free for other users.

Traditional hashed systems do suffer drawbacks, however. When unique keys hash to the same bucket, a collision is said to take place. This is not inherently bad, but since a file is created with a fixed number of buckets, the incidence of collisions increases with the total number of records in the file. If the bucket is not large enough to store all of the records whose keys hash to this bucket, the excess must be stored as chained overflow buckets. Overflow directly impacts the record access time, as overflow increases the number of buckets that must be read and searched for a key match.

Alternatively, if the file is created with too many buckets, the overflow problem is avoided, but disk space is wasted with empty buckets.

Either of these problems can be resolved by recreating the file, but this is a cumbersome solution. First, the file has to be periodically monitored for overflow. Second, in order to recreate the file, it is necessary to create a new file with the correct number of buckets and then copy all of the records over. There must be sufficient storage space for both the new file and the old file during this process. Clearly, recreating files is costly in terms of personnel, computing time and storage resources.

Another common problem with hashed filing systems is related to the size of the bucket versus the size of the records. As collisions occur, the number of records stored in a bucket may get very large. This increases the number of records that must be searched for a key match. Consider the scenario where the bucket is very large, and records are very small. Even though no overflow may be present, the performance is hindered by searching within the bucket. This is somewhat like looking for a particular Smith J. in the New York City phone book the phone book might be turned to the right page, but a lot of Smith J.'s must be read through before the correct one is found.

Revelation Technologies has solved these problems with a revolutionary filing system used by its flagship database product, Advanced Revelation. Linear Hash (short for linear hashing with two partial expansions, separate chaining, distributed control and variable length records) is a hashing filing system that offers:

Automatic file resizing to manage overflow. The number of buckets is not fixed.
Distributed control of the hashing algorithm over a local area network.
Variable length records.
Optimized performance with the ability to specify bucket size during file creation.

Linear Hash files are self optimizing. The filing system will automatically resize the file as required. It is not necessary to anticipate the eventual size of the file when it is created, nor is it necessary to constantly monitor the file for overflow.

Unlike indexed file systems, which require a minimum of two disk accesses per read, records in a typical Linear Hash file can be read in an average of only 1.24 disk accesses. Linear Hash files maintain uniform record access time regardless of the number of records stored.

A Linear Hash file is divided into buckets known as groups. Each record will hash to one group. A group is comprised of subunits called frames. A group will always have a primary frame (the first frame in the group). It may also have linked overflow frames. The storage that is comprised of all primary frames is called the primary space. The overflow is comprised of all overflow frames. Refer to Figure 1.

Linear Hash allows specification of the frame size when the file is created. Linear Hash files may be created with the frame size ranging from 256 bytes to 10,000 bytes. This allows optimization of file performance for different environments. Optimizing frame size (a topic beyond the scope of this discussion) requires consideration of several factors including the number of records and the average record size, disk transfer speed and access time, physical disk sector size and cluster size, and the proportion of read, write and delete operations that will be done on the file. By default, Linear Hash files are created with a frame size of 1024 bytes. All frames in a Linear Hash file (link and overflow) are of equal size.

Threshold, a measure of the primary space used, can also be specified to optimize file performance. Threshold can be set during file creation, or any time thereafter. Threshold is the percentage of primary space that must be filled to trigger file expansion. File contraction occurs when primary space utilization falls 10 percent below the threshold level.

In essence, the threshold setting is an indicator of the file's ability to add records with minimum overflow. A steadily growing file might be more efficient with a low threshold, as it would maintain a larger amount of available primary space. A higher threshold would be suitable for a more static file, allowing a larger amount of primary space to be used before resizing. The default threshold value for Linear Hash files is 80 percent.

If the records in the file all occur within primary space, there is no need for overflow frames. If there is no overflow necessary, the overflow file takes up no storage space.

As collisions occur, however, the filing system may need to allocate and link overflow frames to the group to accommodate the records. These frames are stored in the overflow file (.OV extension see Linear Hash Structure, below). When possible, the overflow frames will be allocated so that frames of the same group are contiguous. Overflow frames may later be emptied when the file resizes or data are deleted. If empty frames occur at the end of the overflow file, they are truncated.

The first frame of the overflow file contains the first frame of the overflow frame free list, which helps to efficiently manage the overflow. This list is a sorted list of pointers to empty overflow frames. As frames are emptied, corresponding pointers are added to the free list. If the size of the free list exceeds one frame, the filing system will create links to other frames in the overflow file, where the list will be continued.

As overflow frames are added or removed from a group, the forward and skip pointers in the affected groups are updated. The forward pointer gives a pointer into the next overflow frame. The skip pointer gives a pointer to the frame where the next record begins. The skip pointer allows searches to jump over frames containing only the overflow of the current record.

Unlike typical hashed filing systems, Linear Hash automatically manages its overflow. The performance of files with excessive overflow suffers as more reads and searches are required to locate a key within a group. Linear Hash's automatic file resizing eliminates the need to constantly monitor the overflow condition of files.

The filing system makes its decision to expand or contract the file based on the percentage of primary space that is full. The header of the Group 1 primary frame contains a count of the total number of bytes used in primary space. This is updated during buffer flushing operations, or when a workstation happens to access Group 1. If the primary space utilization is either above the threshold, or 10 percent below the threshold, then the file should be resized.

As a workstation accesses a file, it keeps track of how it has impacted the primary space usage. If it senses that the file should be resized, it first flushes its record of the changes to primary space into the counter in Group 1, and re-evaluates the need to resize.

When the decision to resize has been made, the workstation sets the sizelock in group 1 to prevent other workstations from resizing the file. If the workstation is able to set the sizelock properly, the resizing process begins.

The following is an explanation of Linear Hash expansion. Contraction is performed by reversing the process.

The basic strategy of Linear Hash expansion is to split existing groups to create new ones. As a file expands, a group is added, and the hashing algorithm is revised. A number of records from selected groups are then moved into the newly created group, based on the new hashing algorithm.

When distributing records into new groups, the filing system treats the file in logical fractions to assure even record redistribution. When an expansion causes the number of groups to be doubled, the filing system logically divides the file into halves. As records need to be distributed from existing groups into new groups, each corresponding group in each logical half is selected in turn. A number of the records from each of the selected groups is then moved. By the time the last groups in the logical halves have been selected for redistribution, the size of the primary space has tripled. The file is then divided into logical thirds, and the process repeats. As the last groups in the logical thirds has been visited, the primary space has once again doubled, and the file is again divided into logical halves. Figure 2 illustrates this process.

The control of the hashing algorithm is distributed among the workstations on a LAN. That is, a workstation does not need to go to a central location to retrieve the latest modulo for the hashing algorithm every time it needs to access a record. This improves performance by limiting the number of disk accesses.

When a group is involved in a split or recombination, the new number of groups, or modulo, in the file is recorded in the header of that group. The modulo is also recorded in the header information of the primary frame of Group 1 after any change in the number of groups.

When a file is opened by the workstation, the current file modulo is stored in memory. It uses that modulo to hash to a particular group. When it reads that group, it rehashes the key based on the modulo stored in the header of that group. If the key still hashes to that group, it seeks the record. If the key does not hash to this group, the current modulo is read from the Group 1 header, the new modulo is stored in memory, and the key rehashed.

If the key hashes to the group using the modulo in the group header, it does not necessarily mean that this is the current modulo. It is, however, sufficient for at least this access.

The Linear Hash file is stored as two operating system level files. The primary frames are stored in one, and all overflow frames are stored in the other. The naming convention used by Advanced Revelation is REVnnnnn.LK for the primary frame file and REVnnnnn.OV for the overflow file. nnnnn represents a five digit integer, which is the same for a given .LK and .OV pair.

Each frame in a file contains information about the file and the frame within the frame header. Although all frames are of the same size, each frame type has a unique header structure. The different header types are diagrammed in Figure 4. The structures within the headers are detailed here:

TYPE

Contains the length of the header, which identifies the header type and starting data byte.

FORWARD

Indicates a forward pointer to next chained overflow frame.

SKIP

Pointer to the overflow frame where the next record begins. Since variable length record structure allows records to occupy more than one frame, the skip pointer improves the efficiency of searches by allowing skips of frames that contain only a portion of a record whose key has already been read.

MODULO

The number of groups in this file. Group 1 modulo is always updated for any resizing. Other groups modulos indicate the number of groups in the file at the time that this group was last involved in a split or recombination.

FRAMESIZE

The number of bytes that each frame occupies.

GROUP

Indicates the group to which an overflow frame belongs.

ALPHA

Number of bytes used in primary space. Alpha is used to approximate space usage to evaluate the need to resize. Alpha is updated by workstations when Group 1 is accessed.

THRESHOLD

The percentage of primary space that must be filled to trigger file resizing. It is stored as a single byte, which is calculated:

THRESHOLD=((percentage full)/100) * 255

The filing system determines the size of primary space by calculating:

primary space = (THRESHOLD/255) * FRAMESIZE * MODULO

The resulting figure is compared with ALPHA to evaluate the need to resize.

SIZELOCK

Determines if a file may be resized.

Value	Meaning
0	file may expand or contract
1	file may not contract
>1	file may not expand or contract

RECCOUNT

The number of records stored in this file. Allows for quick retrieval of the approximate number of records on file without having to read through the entire file. The RECCOUNT information is updated during normal I/O operations, and is checked and updated during file operations where all keys in the file are read.

RECCOUNT occurs in the header of the primary frame of group 1 in files created with Advanced Revelation Version 1.1 and higher.

SIZE

The number of free frames in the free frame list. The last item in the list always points to the frame beyond the last frame in the overflow file, which will be created as needed.

NEXTLIST

Used when the overflow free list exceeds one frame. Points to the frame containing the next segment of the free list.

FREE

Elements of an array of sorted pointers into free frames. The size of the array depends upon frame size. These elements are not the actual overflow frames, but pointers to frames.

Records are packed into a group end-to-end. Each record is terminated with an FFH byte. The group is terminated with a 80H byte.

Each record begins with a sequence of bytes that stores the record length and the key length. The record and key lengths are stored as one or more bytes each. Each byte is a seven bit value, with the most significant bit serving as a terminator flag, indicating the end of the sequence. Only bits zero through six are significant to the length. The bytes are arranged with the most significant byte occurring first.

To illustrate how this scheme is employed, consider the byte sequence extracted from a frame as illustrated in Figure 3.

In this example, the record length consists of two bytes: 02 81. The second byte has bit seven set, indicating that it is the last byte in the record length sequence. The most significant byte occurs first, so the record length is calculated:

02  shifted left seven bits       = FFH
81  strip bit seven (AND with 7F) = 01H
                                    -------
record length equals the sum        100H

This also points to the End Of Record marker from the first byte of the id length sequence. This can be used to check the integrity of the pointers in the record by verifying that the FF occurs where the pointers indicate. If the pointers do not correctly point to an End Of Record mark, the filing system reports a format error.

Advanced Revelation's Linear Hash filing system provides an ideal solution to file management problems with variable length records, self optimizing files, and efficient operation on networks. Linear Hash eliminates the need for system administrators to monitor overflow. Distributed control of the hashing algorithm and file resizing optimizes performance in LAN environments.

Figure 1

                    Group 1     Group 2      Group 3
                                                         --
Primary Space       -------     -------     -------        |
                   |Primary|   |Primary|   |Primary|       |  F
                   |frame  |   |frame  |   |frame  |       |  I
                    -------     -------     -------        |  L
                       |           |           |           |  E
Overflow            --------    --------    --------       |
                   |Overflow|  |Overflow|  |Overflow|      |
                    --------    --------    --------     --

Figure 2

Primary Frame Group 1

Byte #            1                        2
  0|1234   |5678|9012   |34   |5678 |9    |01  |2345    |6789->
  t|forward|skip| modulo|frame|alpha|thres|Size|Record  |Data
  y|       |    |       | size|      hold |Lock|Count   |
  p|       |    |       |     |                 Arev1.1+|
  e|       |    |       |     |                 only    |

Primary Frame (other than Group 1)

(Frame 1 - n of .LK file)

Byte #            1
  0|1234   |5678|9012  |3456->
  t|forward|skip|modulo|Data
  y|       |    |      |
  p|       |    |      |
  e|       |    |      |

Overflow Freelist

(Frame 0 of .OV file)

Byte #           1             2
  0|12  |3456|7890 |1234 |5678 |9012->
  t|size|next|free1|free2|free3|etc.
  y|    |list|     |     |     |
  p|    |    |     |     |     |
  e|    |    |     |     |     |

Overflow Freelist

(Frame 1 - n of .OV file)

Byte #            1
  0|1234   |5678|9012  |3       |45678->
  t|forward|skip|Group |Reserved|Data
  y|       |    |      |        |
  p|       |    |      |        |
  e|       |    |      |        |

Figure 3

                          FF  02   81   82   38   34   41   42   FE   (...)
                          |    |    |    |    |    |    |   |     |
end of previous record ---+    |    |    |    |    |    |   |     |
most significant byte  --------+    |    |    |    |    |   |     |
  of record length                  |    |    |    |    |   |     |
next and last byte of  -------------+    |    |    |    |   |     |
  record length                          |    |    |    |   |     |
first and last byte    ------------------+    |    |    |   |     |
  of id length                                |    |    |   |     |
first byte of id       -----------------------+    |    |   |     |
next and last byte     ----------------------------+    |   |     |
  of id                                                 |   |     |
first byte of record   ---------------------------------+   |     |
  field one                                                 |     |
second byte of record  -------------------------------------+     |
  field one                                                       |
field delimiter        -------------------------------------------+

Figure 4

W,X,Y,Z are individual frames, letters are used to simply show how breaks are made in file resizing.

                                   X
Group added
Group doubled logical keys         XY

First group of each half visited   XYZ
Logical thirds

First group of each third visited  XXYY
Logical halves

First group of each half visited   XXYYW

Next group of each half visited    XXYYZZ
Logical thirds

First group of each third visited  XXYYZZW

Next group of each third visited   XXXXYYYY
Logical halves

First group of each half visited   XXXXYYYYW

Next group of each half visited    XXXXYYYYWW

Next group of each half visited    XXXXYYYYWWW

Next group of each half visited    XXXXYYYYZZZZ
Logical thirds

First group of each third visited  XXXXYYYYZZZZW

Next group of each third visited   XXXXYYYYZZZZWW

Next group of each third visited   XXXXYYYYZZZZWWW

Next group of each third visited   XXXXXXXXYYYYYYYY
Logical halves

The Linear Hash Filing System

Filing Systems

Hashed Filing Systems

Linear Hash Files

Overflow Handling

Automatic Resizing of Linear Hash Files

Distributed Control of Hashing

Linear Hash File Structure