Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 27 AUG 2007 04:41:01PM David Nicol wrote:

I am trying to create a report using 3 tables with one table being accessed by Reduce, Select By, and Readnext. I essentially followed the code in Help, but it hangs on the Readnext statement. The 3 columns specified in the reduce_script all have btree indexes. The table contains about 750,000 records and the Reduce and Select By seem to function OK. The code is below. Can anyone tell me where I'm going wrong?

For i=1 to 8

ClearSelect i

next i

*

Flag="

done=FALSE$

REDUCE_SCRIPT=WITH {PROPERTY} EQ ":S.Property:" AND WITH {ACCOUNT} EQ ":Selected_Account:" AND WITH {DATE} GE ":Fiscal_Start

*Msg(Reduce_Script,"","","")

Mode=1

REDUCE(REDUCE_SCRIPT, SORT_LIST, MODE, FILE_NAME, CURSORVAR, FLAG)

If Flag Else

Msg("Failed to Reduce","","","")
Return

End

SELECT FILE_NAME BY SORT_LIST USING CURSORVAR ELSE

Msg("Failed to Sort","","","")
Return

End

*

OPEN "",FILE_NAME TO ENTRIES_VAR ELSE

Msg("Failed to Open","","","")
Return
End

*Loop

READNEXT KEY USING CURSORVAR BY AT ELSE Done=TRUE$

At 27 AUG 2007 04:52PM Bob Carten wrote:

Are you using OpenInsight 8.01?

We made a change in 8.x related to selecting non-zero cursors. The 8.0 version was buggy, you want 8.01.

- Bob


At 27 AUG 2007 05:11PM David Nicol wrote:

Actually, I'm still on V4.13


At 28 AUG 2007 03:45AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

The code looks OK, but we have two suggestions.

The first is to move the open to before the select and reduce calls.

The second is to check @FILE.ERROR after the select and reduce calls.

The Sprezzatura Group

World leaders in all things RevSoft


At 29 AUG 2007 02:54PM David Nicol wrote:

I moved the OPEN before the REDUCE. I checked and the status comes back '0' on both the REDUCE and SELECT BY and it still hangs on the READNEXT.

I'm not able to see what is happening with OpenEngine because it won't display. I have let it run for up to 15 minutes and all I get is an hourglass. The only option seems to be to close OpenEngine to stop the process.


At 29 AUG 2007 03:08PM [email protected] wrote:

What are the LK and OV sizes?

[email protected]

The Sprezzatura Group Web Site

World Leaders in all things RevSoft


At 29 AUG 2007 05:09PM David Nicol wrote:

REV39517.LK=75,621 KB

REV39517.OV=37,499 KB

From LH Table Info:

Modulo=75621

Record Count=0

Frame][size=1024

Threshold=80

Size Lock=0


At 29 AUG 2007 05:13PM [email protected] wrote:

Well with a record count of 0 it's busy reading all the frames looking for a record…

[email protected]

The Sprezzatura Group Web Site

World Leaders in all things RevSoft


At 29 AUG 2007 09:03PM David Nicol wrote:

I copied the records from the file into a temporary file and then deleted the original. I created a new file, checked and it verified. I then copied the from the temporary into the new file and rebuilt indexes. The LH Table Info still shows 0 records however, with Verifying Table Status report, it shows the correct record count. I don't know why the LH Table Info shows '0' for Records and don't know if there is a way to force it to update. Any suggestions?


At 29 AUG 2007 09:47PM Barry Stevens wrote:

Do an rlist select in the system editor exec line.?

(I think this was an old arev fix, so might not apply)


At 30 AUG 2007 06:11AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

What happens if you remove the sorting? Depending on how the sort logic is set up, the system might resolve the select at the initial readnext call, which could cause it to hang.

The Sprezzatura Group

World leaders in all things RevSoft


At 30 AUG 2007 11:55AM Richard Hunt wrote:

As you did before… create another file. Copy the rows to the new file. Test your SELECT / READNEXT procedure before creating indexes. If that works then add one index at a time until the problem happens again.

Otherwise, if the problem happens again without adding any indexes, I would consider looking at all dictionary symbolics that are being used. It is possible that they were programmed to "endless loop".


At 04 SEP 2007 07:48PM David Nicol wrote:

On the newly created file without indexes, it worked OK(Very Slow). I added btree indexes to Property(data column) and Account(data column) and it still worked(not as slow). When I added the btree index to date which is also a data column in the table, it again hung. I stopped it after 40 minutes. The report would not be useful if the time to select an average of 40 records from a 750,000 record file without the date index remains at about 10 minutes. The entire report needs to perform this select approximately 120 times (one for each property). I was hoping for the selection to be 2 minutes or less. Any ideas on how to get the ReadNext to work with the date field indexed?


At 05 SEP 2007 04:35AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

As a testing tool, what happens if you use a BTREE.EXTRACT on the date values? Is it still slow?

The Sprezzatura Group

World leaders in all things RevSoft


At 05 SEP 2007 05:11AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

Also consider using a composite symbolic and indexing that

@Ans={PROPERTY} : "*" :{ACCOUNT} : "*" {DATE}

The Sprezzatura Group

World leaders in all things RevSoft


At 05 SEP 2007 05:12AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

Without the missing colon of course ;)

The Sprezzatura Group

World leaders in all things RevSoft


At 05 SEP 2007 10:13AM Warren Auyong wrote:

I seem to recall a similar thing happening on our main datafile. I added a btree to speed up some reports and it would hang up somewhere, either in run_report or maybe a readnext loop. I never did solve the problem other than removing the index. I do not recall if it was a date field or not.

I shall have to search my posts, I do not recall if I posted about it or not.


At 05 SEP 2007 02:05PM Richard Hunt wrote:

Try making the date index a crossreference index rather than a btree index. See if that changes things. Is the date field multivalued? What does the date look like in the table row? Is it ICONVed, or rather… does it look like 08/01/2007 or 14458?


At 05 SEP 2007 06:19PM David Nicol wrote:

The date field is ICONV so it does show something like 14567


At 05 SEP 2007 07:55PM Barry Stevens wrote:

What happens if you only BTREE the date.

If that works then select pass 1=date

select pass 2=rest of selection fields

Try single select first with all selection fields first, time it, then split into pass 1 & 2 as above, then use the fastest.


At 05 SEP 2007 07:59PM Barry Stevens wrote:

B4 you try my last test/method, what happen if you only have the DATE in the selection command. If that works then make sure any selects on that file that uses date is split into 2 selects with the first select using the indexed field that will return the least amount of records.


At 12 SEP 2007 05:44PM David Nicol wrote:

I thought about BTree.Extract but felt I would have to sort the results before printing.

I tried @ans={Property} :"*": {Account} and it still hung on the ReadNext when I had the Reduce_Script including the date. I looked at the dictionary and say that the column {Date} showed as a SYN for column 1 so I tried using {Transaction_Date} which is defined for the same column. Then I tried changing the Reduce_Script to only include {Property} and {Account} and thought I would just skip over the records that were not in the date range that I wanted. It works, but slowly. The first three ReadNext commands respond quickly. After that, there is at least a 2 minute delay between each execution of the ReadNext. Can anyone suggest why the ReadNext has such a slow response time?

This is the code:

For i=1 to 8

ClearSelect i

next i

*

Flag="

done=FALSE$

OPEN "",FILE_NAME TO ENTRIES_VAR ELSE

Msg("Failed to Open","","","")
Return
End

OPEN "DICT",FILE_NAME TO DICT_VAR ELSE

Msg("Failed to Open","","","")
Return
End	

REDUCE_SCRIPT=WITH {PROPERTY} EQ ": S.Property:" AND WITH {ACCOUNT} EQ ":Selected_Account

Mode=1

REDUCE(REDUCE_SCRIPT, SORT_LIST, MODE, FILE_NAME, CURSORVAR, FLAG)

If Flag Else

Msg("Failed to Reduce","","","")
Return

End

SELECT FILE_NAME BY SORT_LIST USING CURSORVAR ELSE

Msg("Failed to Sort","","","")
Return

End

*

First_Rec=1

Loop

READNEXT KEY USING CURSORVAR BY AT ELSE Done=TRUE$

Until done

Read Record From Entries_Var, Key Else
	MSG("%1% is missing","","",Key)
End

*

* REMOVE APPROPRIATE VALUE FROM MULTI-VALUED COLUMN(S)

*

E.AtID= Key

S.Description=Record

S.Date= Record

S.Account= Record

S.Amount= Record

Debug

If S.Date LT Fiscal_Start then Goto Skip


At 12 SEP 2007 08:48PM Karen Oland wrote:

The delay is probably from a deferred resolution of the full sort/select order.

Use btree extract to get the smallest key set, then use rlist to finish selecting and sort at the same time, or use btree extract to get a final key set - you can then sort with v119.

Using rlist() is easier, tho (although perhaps not the fastest).

If you have knowledge of how your data is spread across your indexes, you can use that to greatly reduce select times and file i/o.

Just as a quick rewrite of a report can sometimes speed the execution by a factor or 10 or 100 (with a correspoding decrease in file i/o – we had one that was an old compiled code generated by a list statement - turned out to be on an MV sort with a break value that was an xlate on a LARGE mv field - with 500 to 1000 items in the field, each one was doing the 500-1000 xlates to resolve the current "whichvalue" in the report. An afternoon of stripping out the generic extractions of data from dictionary calls took a report from an hour or more down to a couple of minutes. Similar changes in selecting records have been seen after our app was in place for many years (despite archiving ability, few use it, so some data sets grew quite large after 10 or so years) – knowing that the data fit a certain profile (we want all currently active orders and 99.99% of all orders ended before today - so, select first all those ending after today, then narrow down to other criteria, rather than selecting on all the criteria and the starting/ending date at once, meant the select went to a few seconds rather than going so long the user would cancel the pgm without an answer or just never use that feature).


At 12 SEP 2007 09:21PM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

Long delays on ReadNext are generally down to very sparse files with lots of overflow frames needing compressing but I think we've already established that this isn't the case here?

The Sprezzatura Group

World leaders in all things RevSoft


At 13 SEP 2007 12:21PM Richard Hunt wrote:

From what I understand about overflow frames… Overflow frames being compressed just removes the unused frames. I believe unused frames will be used again, when needed. There is a list of unused frames at the beginning of the "OV" file.

Would compressing overflow frames really speed things up? Aren't unused overflow frames more of just a simple waste of disk space?

No matter if it is reading overflow frames sequentially or not, will not save that much time. It still reads the overflow frames one at a time. Maybe I am wrong. Maybe disk caching would be effected by the sequential issue.


At 14 SEP 2007 04:56PM David Nicol wrote:

I tried Btree.extract and found that it was somewhat faster. (about 4 minutes compared to the reduce-select time of 1.5 hours) to extract approximately 30 records from the 760,000 record file. I thought that I had read somewhere that btree.extract would give the results in order of the index which in this case would be the date. Apparently not so. I will check into v119 to see how to apply that subroutine to give sorted results but I'm still concerned about the time. Does anyone have a suggestion on how to make this perform better?


At 15 SEP 2007 11:51AM Richard Hunt wrote:

Your reduce script criteria has three seperate fields (PROPERTY, ACCOUNT, and DATE), it is too much for the first SELECT statement. I do believe that this will not allow any of the indexes to be used and it will have to select by way of the whole table.

What you need to try is having your first select only select using only one indexed field, choosing the one that will produce the smallest result. And do not sort the first select.

Have you tried this???

REDUCE_SCRIPT=WITH {PROPERTY} EQ ":S.Property

or

REDUCE_SCRIPT=WITH {ACCOUNT} EQ ":Selected_Account

or

REDUCE_SCRIPT=WITH {DATE} GE ":Fiscal_Start

then continuing with…

Mode=1

REDUCE(REDUCE_SCRIPT, SORT_LIST, MODE, FILE_NAME, CURSORVAR, FLAG)

SORT_LIST="

SELECT FILE_NAME BY SORT_LIST USING CURSORVAR ELSE

Msg("Failed to Sort","","","")

Return

End

Try this and report back the time it takes to do each one of the three different REDUCE statements with the unsorted SELECT statement.

After you try timing the first select, then you can add the remaining REDUCE critera and SORT critera to the second select criteria.


At 17 SEP 2007 05:40PM David Nicol wrote:

I will try this approach and post the results.

I'm not sure, but my problem does not appear to be with the reduce/select. If I 'debug' after the reduce/select the time to get to that spot in the code seems to be fairly quick. However, the time from one readnext statement until the next appears to get progressively longer. The first 3 records read fairly quickly but from the 4th onward the time is very slow.


At 17 SEP 2007 06:21PM Karen Oland wrote:

As Richard has said, first find out which index yields the lowest number of records. Use it to select first, then further reduce that list using the other two fields and your sort (this group gets read in it's entirety and the indexes are not used at all). This can often be faster than using three indexes, where results must be merged and one of the indexes returns a huge number of keys, while at least one other returns only a few (it turns out to be faster to read the records if the dataset is small enough from one index).

I tend to just use rlist() to do it all (when being lazy) but you could continue with the reduce syntax or use btree.extract and then read/select/sort the rest yourself.

And if the dataset is small enough, you "could" (but not necessarily should) create a relational index on that date field (yielding results in seconds, which you then further compare). Or, you could create a symbolic that has the three fields together

{date}:"*":{field2}:"*":{field3}

and then index it - leaving a quick extract from it, assuming you are looking for only a few dates and one value each from field2 and field3 (more searches would mean a little more code, but again btree.extract would quickly find them – assuming the indexes didn't take forever to update the first time you did a search). And of course, with some applications, a relational index of this symbolic to a "index" table would mean finding the data could be done very, very quickly (at the cost of perhaps lots more space – probably a bit more than simply a btree index on the above).

Space vs. time is always a tradeoff. If you need near instant access (and can code the concatenation to do date ranges or multi values of the other two fields in your lookup), a relational index to a hard table is hard to beat. It also takes up a lot of room and can be a bear if it ever has to be rebuilt. But this is basically what many SQL databases are doing when building their group indexes.


At 18 SEP 2007 09:25AM Gerald Lovel wrote:

The concept of an INDEX table for relational indexes could be generalized completely as follows:

1) Define the "RELATIONS" table to hold all relational indexes.

2) Add a "TABLENAME" symbolic to all data dictionaries. I won't describe the code, but it is relatively easy.

3) Create relational indexes on symbolic columns, where the code would be

 @ANS={TABLENAME}:"*":{DATA1}...

4) Build relational indexes from symbolic columns to table "RELATIONS".

5) Pray the RELATIONS table never gets corrupted.

I think step 5 is what keeps most of us from pursuing relational indexing.


At 18 SEP 2007 10:41AM Karen Oland wrote:

Sounds like empty overflow frames – they all have to be read, empty or not.

Or, the select has found the first one or two and then restarts the select process from the beginning each time, finding it's current location and then returning the next. Not that anyone would code this way (but, you never know)!


At 18 SEP 2007 10:55AM Karen Oland wrote:

]

No doubt. I came in on a large system as it was being replaced by Oracle (was there for several years, while a dozen O employees tried to make their big system compute what the little AREV system did). Unfortunately, they refused to install the NLM for a long time and everytime it got busy, a GFE would pop-up. Spent many a weekend rebuilding indexes, including one monster relational one. And literally the entire weekend, bring a sleeping bag style - the process had to be monitored the entire time, as it could crash as well and we only had a bit over 60 hours to run to completion (if it ran ok, usually finished up in the wee hours of Sunday mornings).

Luckily, computers are faster and the NLM/NT Service/UD all reduce the need for this type of rebuild (but not ones due to users rebooting while an index is updating, resulting sometimes in missing nodes in the index). When the UD can do the re-index itself, then the time for a rebuild should be much reduced (even better if it can do it as a side process while still serving requests for the rest of the tables, only keeping perhaps one process offline during a rebuild – and in a relational table, that should only be occasioned by a massive server crash).


At 18 SEP 2007 04:39PM Richard Hunt wrote:

I must be confused… Overflow frames are pointed to, and each overflow frame points to the next overflow frame. I can not see where the system has to read any empty overflow frames.

Overflow frames are handled by pointers. The primary (or LK) frame has a pointer to the overflow frame (that is if one exists for the primary frame) in byte offset 2 and is 4 characters long. Each overflow frame has the next overflow frame in byte offset 2 and is 4 characters long. Using the pointer and the frame size, the linear hashed file handler calculates the exact offset in the file where the next frame is. I know of no case where there is an empty overflow frame included in this chain.

Unless you mean that the system has to jump to offset 409,600 or offset 40,960 I really do not think it takes much of any time to read from either offset. Maybe I am confused.


At 20 SEP 2007 05:27PM David Nicol wrote:

Here are the results. All three of the select/reduce steps are very quick (seconds) but in each case the first ReadNext statement takes about 2 minutes for the first record, the second and third records are seconds. the fourth record takes about 2 minutes and then the fifth is 10 minutes. On the 6th, I got impatient after 10 minutes and killed it.

Karen Oland suggested using Rlist(Select… but I don't believe I can do that because I use Rlist(Select on the primary file for this report and need to have access to the cursors so as not to lose the ability to read the next property from the property file.

The best that I have been able to come up with is to Reduce/Select on the Property and Account which gets me approximately 100 records from the 750,000 record file and process all the dates just skipping over the ones not needed.

The reason for attempting this approach to the report was because the original report that I did had the primary file as the file with the 750,000 records and uses a Rlist(Select… approach. That worked well and ran quickly but would not print a report for a property that had no activity(ie - no selected records)

View this thread on the Works forum...

  • third_party_content/community/commentary/forums_works/7092b23153ebcc808525734400719ea2.txt
  • Last modified: 2023/12/30 11:57
  • by 127.0.0.1