File Resizing/Corruption (AREV Specific) [Revelation On-Line Wiki]

AREV Specific

Sign up on the Revelation Software website to have access to the most current content, and to be able to ask questions and get answers from the Revelation community

At 16 SEP 1998 02:57:09PM Dianne Graveline wrote:

We have a file that is resizing itself when a ?PERFORM SELECT ….? and a loop and read are done in a program. The size of the .LK file is 35 megs and the size of the .OV file is 15 megs before the perform select. During the loop and read the .LK file gets smaller and the .OV gets larger to the point where the .LK decreases to 1K and the .OV increases to 89 megs.

Here?s the program code:
 OPEN ?ITEM.MASTER? TO ITEM.MASTER ELSE
    STOP
 END
 PERFORM ?SELECT ITEM.MASTER?
 EOF=0
 LOOP UNTIL EOF
    READNEXT ID THEN
       READ IM.REC FROM ITEM.MASTER, ID THEN
       END
    END ELSE
       EOF=1
    END
 REPEAT
 STOP
We originally had code after the READ, but in trying to pinpoint the problem we stripped out code until we knew exactly where the problem was occuring. If we change the line PERFORM ?SELECT ITEM.MASTER? to SELECT ITEM.MASTER (an Rbasic select) we do not have a problem - and believe it or not we can see a reverse shift in the sizes of the files - the LK starts to expand and the OV starts to shrink.

We have used the above code with different files and the problem does not occur in them.

A VERIFY on the file shows no GFE?s. Other statistics from the verify are:

THRESHOLD=80%

MODULO = 20552

LOAD FACTOR=1.367 (this is alarming since most other files are about .60 to .70)

ROWCOUNT=97836

ROWLENGTH=259.85

AVERAGE ROWS PER GROUP=4.87

SIZELOCK=0

I did a remake on the file (REMAKETABLE DATA ITEM.MASTER 97836 256 194 1024 80), I also copied the file to my hard drive, changed the DOS name, created a new pointer in REVMEDIA but the resizing problem still exists when the above code is run.

As a temporary solution we have changed the code to an Rbasic select, however I feel that there is still a problem with the file.

Does anyone have any suggestions or solutions?

We are using AREV 3.1 and Novell NetWare Client32. We also have a mixture of DOS and Win95 machines. The problem occurs when we use either a DOS or Win95 machine.

At 17 SEP 1998 05:10PM Victor Engel wrote:

I gather this is version 2.12?

If you could, perform these steps:

Dump ITEM.MASTER

Press CTL-H (this puts you in hex view mode.)

Write down all the two digit hex numbers you get.

Press the right arrow and write those numbers also.

Press ESC until you are out of DUMP.

Post the numbers that you got as well as the size of the .LK file here.

At 17 SEP 1998 06:47PM Dianne Graveline wrote:

Thanks for your help Victor.

The version of AREV is 3.1 - at least that is what is displayed when

I type WHO. I thought we had upgraded to 3.11 or 3.12 though.

Here are the hex numbers from DUMP:
 1A 12 43 00 00 12 43 00 00 53 14 00 00 00 04 97
after the right arrow:
 1C 01 00 CC 00 00 57 7F 01 00 01 F5 86 32 33 30
The size of the .LK file is 5203KB

I have another copy of this file that reduced to a size of 1K in the .LK file, but I can't even do a DUMP on it. It goes to the debugger and says !LHDUMP variable exceeds maximum length.

Also, I removed all indexing from this file manually by following the directions in Knowledgebase R56 and the problem went away. I haven't reinstalled indexes yet to know whether there is something about the index that was affecting the file or if the index was corrupt. I have btree, xref, and relational (both from and to) indexes on this file.

At 17 SEP 1998 10:03PM Aaron Kaplan wrote:

I don't see what indexes have to do with this, at least not logically. Technically, indexing is unrelated to LH structures. Even if one does want to go into the argument of LH is the active dictionary and the indexing, hashing and auto-resizing is definitely unrelated.

I know what's happening…something is corrupting the alpha. It's one of those common, yet rare things, meaning it doesn't happen very often, but it's not unheard of.

The effects are simple. Instead of calculating used and free space, it updates a rolling value with the amount used. This is what is called the alpha. The alpha value, combined with the threshold setting, is used to determine when it's time to grow or shrink the file.

LH never grows or shinks by more than 2 groups. Mostly it's only 1 group, but there's code allowing for 2 groups, but only on an expansion.. When it does do 2 groups, it's always on the groups it was working with, never on a random group.

Now, indexing is a different beast. In most ways, indexing is just another MFS, and in that respects is nothing differnt from any other MFS you can create. However, indexing doesn't really abide by all the rules an MFS should abide by, and does do some strange things with memory, and there is logic in some code to account or not account for it's presence. However, it does no direct manipulation of files. Everything is through RTP57, there's not an RTP57A call in the bunch, so logically, anything it might do to a file (and all it would really do is read, never write) is no different than any standard file call.

So, we're left with PERFORM selects verses R/BASIC selects. I've got a nice little missive in SENL Volume 1, Issue 10 about the differences between these. To sum up, a PERFORM issues an R/BASIC select, reads through the list, writes the keys out to temporary LISTS records, and sets some flags for READNEXT (RTP11) to handle the processing.

The readnext logic is different, to a point, but only in how it retrieve the keys. In the perform select, where the file is shrinking, it's not really pulling the keys from the file during the readnext, it's just getting the keys from the stored list. After that, it's just a simple, standard read, and LH wouldn't care what flag is set for READNEXT, or at least it shouldn't.

To prove or disprove this, you can run the tests, but without any of the processing logic, just the READNEXT. You can also try making a list active through a GETLIST, (since that's what PERFORM does) and see what happens there.

I don't know of any defined LH bug this happens with but there are some garbage collect/flush issues with MATREAD, which is used in some indexing logic, but it shouldn't be used in a select, only with BTREE.EXTRACT.

I ran through your hex headers and got this:
Modulo=5203       Framesize=1024    Primary %=1       Rowcount=98135
Threshold %=80    Frame#=0          Group Modulo=5203 Sizelock=0
Group=1           Overflow#=0       Forward=17170     Skip=17170
Does it match yours?

akaplan@sprezzatura.com

Sprezzatura, Inc.

At 18 SEP 1998 12:41PM Victor Engel wrote:

The reason you are getting that error is that the file has resized to a point where all most of the records are in overflow. The system is trying to store the list of keys in a variable. Unfortunately, they will not all fit in one variable. The solution is to resize the file. However, if you try to do this, you will probably get the same error, depending on your network driver I think. I recommend doing this:

* Do a backup

* Get everyone off the system

* Change your network driver (in Arev) to non-networking (note which driver you currently are using)

* COUNT filename

* Run your trimmed down program (hopefully it will redistribute the records appropriately at this point)

* Change back to your original driver

* Run your test again

Thanks, Aaron for doing all the math. Your modulo matches the LK file size given a frame size of 1K.

One other question. Do you have an MFS on this file?

At 23 SEP 1998 12:33PM Dianne Graveline wrote:

Hi Aaron,

Yes, the hex headers you calculated match the data that is displayed when I do a DUMP on the file.

We ran a process last week that does an Rbasic select and reads every record in the file (thinking that this was okay to do). From that point on the .LK kept shrinking and the .OV kept increasing with every hit that every user on the system was making. We got everyone off of AREV, did a REMAKETABLE on the dictionary and data portions of the table, thinking that this could buy us some time and give us the opportunity to let users back on the system while we tried to resolve the problem. Our plan was to NOT run any processes that go through the entire file until we could resolve the problem. This plan worked fine for a couple of days until someone accidentally ran two processes that ran through the entire file and we discovered that our problem no longer existed. I had previously done a REMAKETABLE on the data portion only of the table (didn't do the dictionary) and it did not fix the problem, so at this point I don't know if it was a problem with the dictionary or if it was something completely different.

I have many copies of this file at different stages of corruption on my hard drive and have run many tests on it. REMAKETABLE does not seem to fix the problem on the tables on my hard drive. We discovered that anything that goes through the file and hits on every record once will trigger the file to start resizing itself for every read thereafter. If the process that is looping and reading records is stopped when the .LK file is resizing downward, and then it is started again, the .LK file will start increasing again to a certain point. Then if we run the process again, the .LK will start to shrink again.

Although the production file is stable right now, my concern is that the problem may appear again. I guess we'll keep our fingers crossed for now.

Thanks for your help Aaron - I welcome any suggestions you may have for testing the corrupted files on my hard drive.

At 23 SEP 1998 12:44PM Dianne Graveline wrote:

Hi Victor,

Thanks for your response. Before I got a chance to do what you had recommended, the resizing problem somehow got fixed. I explained this all in my last message to Aaron.

You asked the question "Do you have an MFS on this file?". The answer is not one that we wrote, only what is created by btree, xref, and relational indexes.

If the problem occurs again, we'll be sure to try your recommendation. What does changing the network driver to non-networking driver then back again do for the file?

Thanks again for your help!

At 24 SEP 1998 12:50PM Victor Engel wrote:

With some network drivers, if you try to do a remakefile on a file that has over 64K of record keys (list of keys in the group exceeds 64K) in a single group, the process will fail. This is not true of the non-networking driver from my experience.

At 24 SEP 1998 01:14PM Larry Wilson, TARDIS Systems, Inc. wrote:

I would try putting a SIZELOCK on the data and ! files before running readnext loops. If it's too much trouble to put in in every program, you could wrap the appropriate routines with your own code. Setting SIZELOCK to 1 will allow growth, but prevent shrinkage. I use it when rebuilding indexes (with my own front end) and it solved all of those mysterious file size mismatches of LK and OV, and I also use it before any routines that read all records. Unless you know your file will be actually storing less data (I've NEVER seen that! ), a sizelock of 1 makes great sense (unless someone can tell me what harm it might do.)

Larry Wilson

TSI

tardis@earthlink.net