Indexing Techniques (TB#25) (Functions/Subroutines/Programs)
Created at 12 AUG 1996 03:23PM
Indexing Techniques
Indexes are used in Advanced Revelation to speed reports and to allow access to records via non-key data. This bulletin will explore techniques that allow more efficient use of indexes. The information in this article is intended to supplement the information found on pp. 165-167 of the Version 1.1 Addendum.
Sorting vs. Reduction
If a SELECT statement (or a select performed as part of an R/LIST sentence) does both sorting and reduction, using two indexed fields, the system will use indexes to satisfy the sort, not the reduction. In the following statement the CITY index will be used, not the ST index:
LIST SALES WITH ST = 'WA' BY CITY
The rationale for this is simple: when users execute a sentence it is assumed they want something to appear on the screen as soon as possible. If indexes are used for reduction, the sort will need to be completed before any results can be printed to the screen. If the sort is given preference, the system can determine which records should be displayed as they are read from disk; data thus appears on the screen sooner.
The total time to completely display a report in this fashion may be slower than when doing the reduction first. However, the user is able to examine data much earlier and has the overall impression of speed.
There is one case where reduction is always done first. If the report is being directed to a printer, the user is probably not concerned about when the report starts, but about when the report finishes. Therefore it is important to complete the report as quickly as possible. When a report goes to the printer, indexes are used for reduction, not sorting.
If the selection will return less than half the records in a file, it can be faster to do the reduction first, then the sort.
The developer can instruct the system to use indexes to satisfy the reduction criteria, rather than the sort, by using the REDUCE keyword:
SELECT SALES WITH ST = 'WA' BY CITY REDUCE
Indexing Multivalued Fields
If an index is to be placed on a multivalued field and the goal is to index each value, that is, individual values will not be further split into words, then the index method of choice is Btree. Btree indexes automatically parse on value marks, so there is no reason to go through the additional overhead of a Cross Reference index just to instruct the system to parse on value marks.
Indexes and Sorting
If an index is used to satisfy range selection criteria (>, <, ⇐, >=, FROM..TO), the keys will come back in sorted order. This makes it unnecessary to include sorting as part of the SELECT statement when sorting on the same field used by the WITH clause, or by a symbolic field based on that field.
In the following example, MONTH is a symbolic field that extracts the month from the SALE.DT field:
LIST SALES WITH SALE.DT >= '5/1/89' BY MONTH SALE.DT
This sentence could be rewritten and achieve the same result:
LIST SALES WITH SALE.DT >= '5/1/89' SALE.DT
By avoiding a sort, the time required to produce the report is reduced.
Nested Sorts
At times it is desirable to sort first by one field, then by another. For example, the primary sort may be by STATE and the secondary sort by CITY. That is, the report generates a list sorted by states, and within each state the cities are sorted.The R/LIST report that produces this report is:
LIST SALES BY STATE BY CITY
Even if both STATE and CITY are indexed, only the STATE index will be used to satisfy the sort. This is because the system is not aware of the relationship between cities and states.
In the CITY index there may be an entry for MIAMI. There may be multiple keys for that city. However, there is no way for the system to know whether the keys are for Miami, Florida or for Miami, Ohio. The CITY index cannot be used to satisfy the secondary sort.
A way around this problem is to create a symbolic field that concatenates the two fields together, then index that symbolic field. For example, a field called STCITY could be created with the following formula:
@ANS = {STATE} : {CITY}
The index would be built with values such as WABELLEVUE, FLMIAMI, OHMIAMI, etc. This index could then be used to speed the sorting in the report.
LIST SALES BY STCITY STATE CITY
While this technique can be carried to as many sort levels as needed, there are some restrictions.
First, it cannot be used for sorts where the primary and secondary sorts go in opposite directions. That is, a symbolic field could not be constructed that would allow an index to be used for a report where STATE is in ascending order and CITY is in descending order.
Second, the developer must be careful when using this technique with numeric fields. The symbolic built to do the indexes should pad out all numbers so they are the same length. As an example, assume that the requirement is for a primary sort on STATE and a secondary sort on PHONE. The symbolic field needs to account for 7 digit phone numbers and for 10 digit phone numbers.
@ANS = {STATE} : {PHONE} "R(0)#10"
Through the careful manipulation of indexes, access to data can be simplified and speeded. The developer should take the time to analyze his reporting requirements and
build indexes to support those requirements.