Record update / delete speed in large files (AREV Specific) [Revelation On-Line Wiki]

Sign up on the Revelation Software website to have access to the most current content, and to be able to ask questions and get answers from the Revelation community

At 28 OCT 2005 03:09:39PM Martin Bell wrote:

AREV Specific

Arev 3.12

Novell 4.2

NLM 1.5

We have a set of master - detail files that have 500,000 and 5,000,000 records respectively (an average ratio of 1:10). On any given day we might have to delete 10 - 20,000 master records (about 200,000 detail). This is very time consuming, in the range of about 4 mins per 4000 records. Note that I'm obtaining the keys of the detail records to deleted and looping through using an RBasic DELETE statement even without record locking, as I'm putting an exclusive lock on the file.

At first, I wondered if the costly part was looking up (via BTREE.EXTRACT) the detail records to delete. So I installed a relational index to keep track of that for me. This resulted in very little improvement.

Next, I speculated that maybe a delete was too costly. So I added a logical delete field to the detail record. I then just marked the record as being 'deleted'. I'm not sure there was any change taking that approach – the writev took about as long.

Any programmatic suggestions? Or is this just the price of doing business?

(Yes, I recognize that certain system/network improvements would increase performance, but this is already 15K Scsi Ultra 320 Raid5 configed running 100Mb ethernet on a Win98 Pentium IV desktop.)

Thanks in advance.

At 28 OCT 2005 03:42PM Mike Ruane wrote:

OpenInsight has been faster than Arev for some time now- is upgrading an option?

At 28 OCT 2005 03:56PM [email protected]'s Don Bakke wrote:

Martin,

Have you tried size locking the table before doing your deletes?

If a complete upgrade to OI is not possible, then perhaps installing OI for utility maintenance purposes like this would be possible.

[email protected]

SRP Computer Solutions, Inc.

At 28 OCT 2005 04:37PM R Johler wrote:

We have an Arev process that finds about 20,000 master records that are ready to be archived. We update the master record showing 'archived', and part of the process is to read/write to archive detail file/delete the detail - about 200,000 of these. We call this "Archiving".

Part of our particular problem is finding the master in file of 5 million master records - so much of our time is spent finding the candidates.

We run "archiving" on "batch processor". The batch processor loops checking for the jobs' start time, locks the job control record containing the program name and runs it. The locking is important as we have 6 pc's running "batch processor" every night doing many timed jobs. Only one pc at time may run any given job.

It takes 4 to 5 hours - but we don't care. No one has to start it, and it sets itself up to run the next day. Our only "to do" is that someone has to start the "batch processor" jobs on those 6 pcs before they leave for the day.

Then we have a manually run process, that is run every day and several times a day, check the control records and if something wasn't run - it notifies us of a skipped job. We don't have to remember to check to see if something ran or didn't run.

At 28 OCT 2005 05:25PM R Johler wrote:

How much faster do you think? Is there a number for my boss to see? Espcially if this comparison was on Netware servers…

At 28 OCT 2005 08:25PM Warren Auyong wrote:

What happens if you use BATCH.INDEXING?

Or create a file pointer in REVMEDIA that doesn't have indexing on it and open that "file" to run the deletes?

At 31 OCT 2005 12:46PM Victor Engel wrote:

I suspect the biggest impact to performance is indexing. You may want to reconsider how you implement indexing, possibly rolling out your own.

You can do deletes using a handle that doesn't have SI.MFS attached. The problem with that, though, is that you create a situation where indexing is out of sync. You'd either have to be comfortable with having out of sync indexes for a time or you'd need to rebuild indexes after a delete session. I suspect the latter keep this option from being viable.

You could use batch.indexing to get some measure of indexing improvement.

You could continue with built-in indexing functionality and use the SI.MFS-less idea, above, plus a function to update the indexes more efficiently. This would require your learning the indexing structure and what locks to use.

Victor

At 03 NOV 2005 04:49AM Hippo wrote:

Single Insert/Delete into/from B-tree should take $C1.log n$ time (but can be delayed for background process), rebuild of B-tree should take $C2.n.log n$. I suppose $C2«C1$ and therefore "batch" insert/delete of at least $C2/C1$ part of the table is faster when we remove indexing before update and reinstall it after.

Can you approximate the C2/C1 ratio?

But if I uderstand it well, even when the records are not deleted (just marked) the process takes long time. It seems that the indexing is not the most time consuming part of the process.

View this thread on the forum...