Sign up on the Revelation Software website to have access to the most current content, and to be able to ask questions and get answers from the Revelation community

At 12 MAR 2003 09:33:25AM Glenn C Harris wrote:

Trying to read data from a Dbase to an AREV file, the KEY to the records is 20 characters in length, each record is approx 220 bytes.

We read the record from the dbase file and then write a record to the arev file. After 3120 records are written the arev file breaks with the message of:

Fatal error writing 10000000000000000001 in file XYZ. The total length of all the key ids in the group exceeds the maximum variable length.

Doing a Dump shows a Group Modulo of 2, Framesize=1024, Size lock=0

All records are in group 1, if we change the threshold and then compress the file, records are placed in many groups. At this point we can write as many records to the file as we want… it appears to be fixed.

If however, we clear the arev file and attempt the process again, we get the same error. Clearing the file appears to put the file back into the condition that causes all records to be written to the same group.


At 12 MAR 2003 11:37AM Hippo wrote:

I've reported bug in wrong recomputation of ALPHA value during SELECT ALL (READNEXTING LOOP) (AREV 3.12) … Cutting of FMC pointer in /FF/ by TCL SELECT ALL "program". This is simillar topic (automatic resizing problems with Alpha and Treshold), but seems to be different.

Can you trace your program and check ALPHA and main MODULO changes during the insert process?

Does ALPHA correspond to actual contents of .LK file always during the process? What is ALPHA/FILESIZE ratio?

Try HASH() function to check, whether all records should belong to GROUP 0 (according current main MODULO).

Is it a coincidence (special key structure)? Does the behaviour change if you change INSERT order or if you convert KEYs by a one-to-one function?

Sorry … no answers, just questions


At 12 MAR 2003 11:52AM Don Miller - C3 Inc. wrote:

When you're doing this process, you should pre-size the file and set the sizelock to prevent it from being re-sized during a batch-add process. I newly created file can put everything into the .OV portion in the same group. This can be particularly troublesome with numeric keys.

If you can reset the parameters in dump and then add records, that is what has happened. The resize / re-order process redistrubuted the keys properly. If you do a CLEARFILE, it will be set back to the old allocation.

Your import process should create the data portion of the file to approximately its final size and set the sizelock / threshold parameters before you import the records.

HTH

Don M.


At 12 MAR 2003 12:52PM Hippo wrote:

Thank you Don for explaining good programming practise.

On the other side … this does not answer the questions bothering me.

Why automatic resizing does not apply? When is it started?

Which operation is responding for changing main modulo?

Why HASH function hashes all keys to group 0 even when there are 2 of them (… if there are 2 of them)?

It will probably explain WHY mentioned programming practise is must.


At 12 MAR 2003 01:37PM Don Miller - C3 Inc. wrote:

If you search this site you will find a number of posts on this subject. There was an excellent one by Aaron Kaplan (Sprezz) quite a while ago. Basically, I think auto-resizing doesn't always work properly when a new file is created and numeric keys are batch-added. My recollection is that they will all hash into one or two frames which will then auto-resize or can be manually resized to redistribute the keys.

Way back when .. old AREV days, I used to create new files whose modulo was always a prime number when numeric keys were were to be added via import. I think the hashing algorithm was fixed / improved sometime around version 2.1x, although I might be way off base.

I'm sure that someone else will jump in on this topic with more info.

Don M.


At 12 MAR 2003 02:06PM Hippo wrote:

OK, thanks Don again.

It means no problem in 3.12?


At 12 MAR 2003 02:13PM Don Miller - C3 Inc. wrote:

Hippo ..

I don't know for sure about AREV3.12. Frankly, if it ain't broke, I don't fix it. The old way seems to work fine so I haven't tested whether the problem will resurface in 3.12. Sorry about that .

Don M.


At 12 MAR 2003 03:12PM Hippo wrote:

OK, thanks for your time


At 12 MAR 2003 06:11PM Victor Engel wrote:

Some of what I explain here may not be technically accurate, but it is accurate enough for this explanation, which is designed to explain the answer to your "why". The hashing algorithm is essentially random, taking a key, and returning a group number. The resizing is triggered by comparing the size of the primary% portion of the file to the threshold value. You can see these statistics when you dump a file. The primary % is a measure of what percentage of the .LK file is filled up.

Now, then. If you have a brand new file, with a modulo of 1, then the primary% exceeds the threshold pretty quickly. At the point when the first record is written to the file that causes the threshold to be exceeded, if the sizelock is less than 2, a resize will be triggered.

The resize operation increments the modulo and then redistributes a selection of the groups (in the case of modulo change from 1 to 2, all groups).

Now our hashing algorithm uses a modulo of 2 instead of 1 to determine what group to place the existing records into. Just suppose the "random" number generated just happened to always be 1. In that case, although the resize was triggered, nothing is actually redistributed to other groups, because the hashing algorithm is mapping them all to the first group.

You can confirm that this is what is happening by dumping the file. All your data will be in the first group. Then next group will be empty.

At this point, primary% is 50 because half of the first group is full (REALLY FULL) and half is completely empty. That means no further resizing will occur until something gets written into the second group.

Besides the previous recommendations, which are good, simply writing a record that hashes to the second group when the modulo is 2 and whose record size is greater than the frame size is sufficient to get things moving smoothly again. Once the resize (to a modulo greater than 2) has taken place, the record can likely safely be deleted.

This is only a problem when lists of keys all hash to group 1 when the modulo is 2. Well, technically, it would also be a problem if all keys hashed to group 1 or 2 when the modulo is 3, but experience has shown this doesn't happen.


At 12 MAR 2003 08:58PM Hippo wrote:

Thanks Victor,

I assumed this behaviour, it seems to me, I understand what ALPHA means.

(I was not sure, if a sequence of only write BFS commands starts resizing, so thanks for making me sure.)

I'll change just 2 little details in the explanation.

1. HASH should be fully determined by key and modulo, otherwise we cannot easily access the same record twice (and hashing has no sence).

OK … random means (here) some function we don't care …

2. ALPHA is a bit higher than 50%, as there is some heading info in the 2nd group …. (just to let it simple … OK)


According Don's memories, problem was with hashing function of 2.12, which chooses first group from two very often.

It fully explains WHY

(It was one of things to test I suggested to Glenn.

And I hope, 3.12 has better HASH function.)


At 13 MAR 2003 04:20PM Victor Engel wrote:

I don't think 3.12 and 2.12 have different hashing algorithms. If they did, they'd have trouble reading each other's files. ROS and RTP57 do, and the HASH Rbasic function is the one used by ROS. The only version difference I know among RTP57 files is the addition of additional header data with version 1.1+. The additional header information includes record count, which was not previously available.


At 14 MAR 2003 04:19AM Hippo wrote:

I am planning to do some experiments with HASH().

The compatibility argument came to my mind, too, but I had no experience with the older versions.

This means Don's method is must.

Thank Glenn for pointing the problem, and both you Don and Victor.

I prefere to understand all bugs/"properties" in the environment I am using. It helps me to find-out best work arounds (Don already knows:) them).


At 17 MAR 2003 10:55AM Hippo wrote:

I've done tests on HASH function and it seems to work well even for small modulos. I've checked filling of the first group in an empty table (using following code).

It gives strange results when I forgot to FLUSH. After it it seems to work fine.


*PROGRAM testing problems during intensive filling of initially empty table

* replace KEY, RECORD …. rename HASH_TEST

DECLARE SUBROUTINE MSG

PERFORM "CLEARTABLE HASH_TEST (S)"

PERFORM "PDISK C:\TEST (OS)"

PRINTER ON

OPEN 'HASH_TEST' TO TEST_HANDLE ELSE STOP

DOS_NAME=TEST_HANDLE-1,'B:'

DOS_NAME=TEST_HANDLELEN(TEST_HANDLE)-LEN(DOS_NAME)-1,2:DOS_NAME

OSOPEN DOS_NAME TO LKFILE ELSE STOP

RECORD=TST'

FOR KEY=1 TO 1000

OSBREAD FRAME FROM LKFILE AT 0 LENGTH 1024

FILLED=FRAME-1,'B':CHAR(128)

F_LEN=1024-LEN(FILLED); * following line isn't needed in our test

STAT=CHAR(MOD(F_LEN,256)):CHAR(INT(F_LEN/256))

IF STAT=FRAME16,2 THEN OK=== ELSE OK=##'

CONVERT CHAR(0) TO '' IN FILLED; IF LEN(FILLED) THEN F_LEN=1024

PRINT FRAME1,26:OK:STAT

* 2 lines of hexa dump per write

WRITE RECORD TO TEST_HANDLE,KEY ELSE MSG('error writing')

FLUSH; * without this the log does not correspond exactly …

* filled from single machine? Otherwise without flush … alpha cannot

* be maintained correctly

WHILE SEQ(FRAME10,1)=1

NEXT KEY

PRINTER OFF

PERFORM "PDISK PRN (S)"

STOP


Glen: Do you FLUSH after updates? Do you fill the table by single source? Can you run the code to test the problem?

The C:\TEST file interests me.


At 17 MAR 2003 12:09PM Victor Engel wrote:

RTP57 does NOT use the HASH function. The HASH function is the algorithm used by ROS.


At 18 MAR 2003 05:04AM Hippo wrote:

OK, so I was totaly out of topic.:))

The compatibility issues means … AREV3.12 can read ROS files using original algorithm, but applies totaly another algorithm on Linear Hash files.

Funny way nowhere

Sorry for boring you

View this thread on the forum...

  • third_party_content/community/commentary/forums_nonworks/d821ec6d203187d685256ce7004a78b7.txt
  • Last modified: 2023/12/28 07:40
  • by 127.0.0.1