Large ASCII Import/Exports (AREV Specific) [Revelation On-Line Wiki]

Sign up on the Revelation Software website to have access to the most current content, and to be able to ask questions and get answers from the Revelation community

At 17 FEB 1999 10:06:21PM Don Bakke wrote:

AREV Specific

We are currently working on a project where we have to export several tables with a million+ rows in each of them into ASCII tables - while at the same time normalizing all the multi-values into sub-tables. We wrote a utility that is supposed to run unattended and will export every table in series.

What we are encountering, however, is a dramatic slow down after the first table has been extracted. The first table took about 6 hours to complete (this includes its various sub-tables) but the pace of the second table told us it would have taken it 100 hours!

We aborted the process thinking we had better get some better hardware to run this. In this process we found that SYSTEMP developed a GFE - which we fixed by clearing the table . On a whim we decided to run the export utility again, starting at the second table, and lo and behold the pace picked up again and was able to be completed in 12 hours. Quite a difference.

So…is there something we need to do within our utility that can keep this process running at peak performance? Is SYSTEMP getting blown from this process? Would FLUSH or GARBAGECOLLECT help at all?

Would this drag also affect importing records as well? Another one of our clients imports about 500,000 cellular call records on a monthly basis. We notice a significant drag after the first 300-400 thousand. Can this be improved in any way?

TIA for any ideas here.

[email protected]

SRP Computer Solutions

At 17 FEB 1999 11:04PM Victor Engel wrote:

In my opinion importing and exporting would drag for different reasons. Importing into an Arev table, you will encounter a lot of I/O as the file resizes itself, unless you presize the file.

I don't know what could be causing the slowdown on the export, unless it is related to a select somehow. The SYSTEMP file is used for temporary storage, and should clear out for you as you gracefully conclude a process. Remember also that if you have your environment set up to remember saved queries, when you perform a select, it has to "forget" the query that drops off the bottom of the query list. If that one happens to be a really big select list, it will take a long time to forget (perform a DELETELIST, basically). This doesn't really fit your symptoms, though. Perhaps you had a sizelock on SYSTEMP greater than 2. I don't know. At this point I'm grasping at straws on little sleep (hey, hallucinations can be good to inspire creativity sometimes).

At 17 FEB 1999 11:16PM Ron Wielage wrote:

Don,

I've been involved in large data exports from Arev and have not experienced the degradation that you mention. These were exports of multi-million records from dozens of files to dozens of ascii files. We used SELECT FILEVAR rather than PERFORM "SELECT " : FILENAME so no keylist was resolved before the output started. Sorry I'm of no help.

As for the import, I suspect it is due to one of two causes: 1) the file hasn't been presized large enough (easy enough to change) or 2) the record keys don't hash evenly – large numbers of records hash to a small number of groups and wind up in overflow. We experience the second problem in UniVerse even though it has 18 different hashing algorithms to choose from. With just one in linear hash, it's that much more likely to happen. If it is this second problem, you could change your key structure or create an MFS to break up the logical file into several physical files, limiting the depth of overflow. You could check whether this is necessary by going into DUMP and seeing whether you have many empty groups (or by creating a file system level program to get an average, mean, lowest and highest number of records per group, etc).

Ron Wielage

Standard Life Insurance of Indiana

At 18 FEB 1999 05:44AM Steve Smith wrote:

Don, for an export, there are a few factors -

(1) DOS is really bad at increasing the size of large files (] 10mB)

(2) Disk fragmentation can be a real drag.

(3) Don't have indexes on the files to export

(4) The degradation at the 300,000 - 400,000 record mark could be cache thrashing on the server if you're on a network. Increase server RAM if possible.

Advice:

(1) Export to a series of large ( 40000 bytes)

(9) Use decent CPUs and fast disks.

(10) Don't use/call expendable subroutines/functions or rely on dictionaries.

(11) Use (if possible) the non-networking driver from a separate copy of AREV for the task. The lack of lock calls will improve speed.

(12) If you don't suffer from memory creep, then don't FLUSH or GARBAGECOLLECT - or else if you do perform them infrequently (every 100 reads or so). These take time.

(13) Consider that you might not have to handle this much data in and out of AREV. I once wrote a series of routines to read direct from CD-ROM into AREV, by constructing an index to the CD-ROM data, instead of importing it into AREV proper. The index took a fraction of the time of a full data load and unload. Consider if you have to load the data into / out of AREV at all, or whether another technique is possible.

Hope this helps,

Steve

At 22 FEB 1999 09:27AM Don Bakke wrote:

Steve/Ron/Victor,

I am still digesting your very informative responses. But I want to thank you now because it will be a little while before I can test everything fully.

I have a followup question based on Steve's suggestion to use RAMDISK. We had already considered this and in fact were planning on purchasing a lot of RAM for this purpose. However, upon experimentation we discovered that the largest RAMDISK we could create is 32MB. We are using Win98.

Is there any way around this? Ultimately would like to create a 400MB RAMDISK…but anything close to 128K right now would be great.

[email protected]

SRP Computer Solutions

View this thread on the forum...