OSBREAD/WRITE (AREV Specific)
At 06 SEP 1998 04:35:11PM Chip Wogu wrote:
I am reading a flat file over 3 Megs and trying to Insert/update an Arev table with about 100 fields. My problem is that it is taking over
48 hours to handle about 8000 rows. Below is my is a scaled down sample Code and I will appreciate if someone can show me what changes I need to make to reduce the time. Thanks
Chip
OPEN 'DICT.EMP.DEDUCTIONS' TO @DICT ELSE STOP
OPEN 'EMP.DEDUCTIONS' TO DEDFILE ELSE
CALL MSG('cannot open DED','A',
,
)END
SWAP CHAR(13):CHAR(10) WITH @FM IN TRANSACT
REC.CNT=COUNT(TRANSACT,@FM) ;* number of records
REC='
IDX=0
@ID='
SPACE5= '
SPACE9= '
SPACE1= '
RECORDLENGTH=533 /*including CRLF*/
LOOP
OSBREAD DATA FROM TRANSACT AT IDX LENGTH RECORDLENGTH
IDX += RECORDLENGTH
WHILE LEN(DATA)
EMPLID=DATA1,5
WHILE EMPLID
READ REC FROM DEDFILE,EMPLID ELSE REC='
DESCR2
ROW1=DATA12,5
IF ROW1 NE SPACE5 THEN
REC='REC=ROW1END ELSE
REC='
END
ROW2=ICONV(DATA17,9,'MD2')
IF ROW2 NE SPACE9 THEN
REC='REC=ROW2END ELSE
REC='
END
WRITE REC TO DEDFILE,EMPLID
REPEAT
RETURN
*=====
At 06 SEP 1998 09:00PM Paul Rule wrote:
Does the Arev file you are updating have any indexes on it?
If it does, it is sometimes quicker to remove all the indexes from the file before doing a big update and then add them back on after the update. Another thing that may make a slight improvement is to read bigger blocks of data with your OSBREAD.
At 06 SEP 1998 10:31PM Chip Wogu wrote:
]
Yes, my table has indexes and I will try removing but how do I read and bigger blocks of data with OSBREAD. Please bear with me I am working without an Arev manual.
Thanks
Chip Wogu
CWOGU@MSN.COM
At 06 SEP 1998 11:09PM Steve Smith wrote:
This is off the top of my head, and some zeros will land
in fields 6 & 7 of EMP.DEDUCTIONS, but anyway…
Try something like (swap SOURCE.TXT with your input filename) :
INPUT.DOS.FILENAME=SOURCE.TXT' OPEN 'EMP.DEDUCTIONS' TO DEDFILE ELSE CALL MSG('cannot open DED','A','','') RETURN END IDX=0 RECORDLENGTH=533*90 LOOP DATA=' OSOPEN INPUT.DOS.FILENAME TO TRANSACT THEN OSBREAD DATA FROM TRANSACT AT IDX LENGTH RECORDLENGTH THEN IF LEN(DATA) THEN IDX += RECORDLENGTH B=(LEN(DATA)/533)-1 FOR A=0 TO B OFFSET=A*533 EMPLID=DATAOFFSET+1,5 READ REC FROM DEDFILE,EMPLID ELSE REC=' REC=TRIM(DATA(OFFSET+12),5) REC=TRIM(DATA(OFFSET+17),9)*100 WRITE REC TO DEDFILE,EMPLID NEXT A END END END WHILE DATA REPEAT CALL INDEX.FLUSH('','') RETURNIt depends on the input data you are working with, but on the
whole DOS is slow handling large files, so read chunks
containing many records, and ICONV is slow and may be replaced
with faster string handling or maths functions.
You should have an AREV manual with your licensed copy of the
application. Saying you don't have a manual is like saying you have an
unlicensed copy of AREV.
Hope this helps,
Steve
At 07 SEP 1998 12:22AM Chip Wogu wrote:
<It depends on the input data you are working with, but on the
<whole DOS is slow handling large files, so read chunks
<containing many records, and ICONV is slow and may be replaced
<with faster string handling or maths functions.
<You should have an AREV manual with your licensed copy of the
<application. Saying you don't have a manual is like saying you have <an
<unlicensed copy of AREV.
I understand what you were saying regarding the manual. The manuals grew a few legs prior to arriving on this job and we have been debating on our future direction i.e go Openinsight or VB. But for now I need to load this data by wednesday this week.
My source file is fixed length for all rows, and columns without data will contain spaces for the proper length of the column
Do you know the alternative string I can use to replace the Iconv. My source data already contain decimal points for amount fields(12.00; .97 or 1.32)
Also in your sample code I did not understand if you I am reading
47790 with RECORDLENGTH=533*90 and if that is what you meant by chunks and what you are doing with B=(LEN(DATA)/533)-1
My code recordlength means that each row contains 533 characters.
Thanks for your help
Chip Wogu
CWOGU@MSN.COM
At 07 SEP 1998 03:27AM Steve Smith wrote:
To reduce I/O it is faster to pick up 90 records at a time, and process this "chunk" as a series of fixed length records
of 533 bytes.
B=(LEN(DATA)/533)-1 is effectively the count of records you have picked up in the chunk, minus one.
B will usually be 89, except in the final read of the file
where you may read less than 89 depending on how many records
there are in the input file.
In effect this is like read-ahead caching of your input file.
The ICONV(variable,'MD2') in my example is replaced by a
variable * 100. This delivers the same effect, but should be a
bit faster, providing your data represents whole integers as x.00
Steve
At 07 SEP 1998 09:26AM Chip Wogu wrote:
Since all the rows contains spaces when there is no value, what does the TRIM syntax do to the incoming data and it's position in the source file. Does it mean that my previous conditional is no longer needed?(My assumption is that I do not need to do my IF statements. Also can I follow through with the rest of the fields:
REC=TRIM(DATA(OFFSET+12),5)
REC=TRIM(DATA(OFFSET+17),9)*100
REC=TRIM(DATA(OFFSET+26),9)*100
REC'S till poistion 533 is reached
I would like to thank Steve Smith and everyone for the help.
Thanks
Chip Wogu
CWOGU@MSN.COM
At 07 SEP 1998 12:25PM Steve Smith wrote:
The TRIM(variable) syntax removes leading and trailing spaces from the string. This enables blank fields to become null. It enables maths routines to operate. It may be more expensive in terms of processing time, though. I'll let you test the output and determine what works best/fastest.
Take backups of all data ahead of your conversion.
Steve
At 07 SEP 1998 08:44PM Chip Wogu wrote:
Steve,
This is the code you assisted me with. I just want to make sure I did not mess up along the way and what does CALL INDEX.FLUSH(
,
) do.Once again thanks for the big help.
SUBROUTINE PSFT_DEDCALC_CHGS
OSOPEN 'D:\DEDCALC\VGENCDED.DAT' TO TRANSACT ELSE
*OSOPEN 'N:\HR512\TESTSQR\VGENCDED.TST' TO TRANSACT ELSE
CALL MSG('cannot open TRANSACT','A','','')END
OPEN 'DICT.EMP.DEDUCTIONS' TO @DICT ELSE STOP
OPEN 'EMP.DEDUCTIONS' TO DEDFILE ELSE
CALL MSG('cannot open DED','A','','')RETURNEND
SWAP CHAR(13):CHAR(10) WITH @FM IN TRANSACT
REC.CNT=COUNT(TRANSACT,@FM) ;* number of records
REC='
IDX=0
RECORDLENGTH=533*90
LOOP
DATA='
OSOPEN INPUT.DOS.FILENAME TO TRANSACT THENOSBREAD DATA FROM TRANSACT AT IDX LENGTH RECORDLENGTH THENIF LEN(DATA) THENIDX += RECORDLENGTHB=(LEN(DATA)/533)-1FOR A=0 TO BOFFSET=A*533EMPLID=DATAOFFSET+1,5READ REC FROM DEDFILE,EMPLID ELSE REC='REC =TRIM(DATA(OFFSET+12),5) /*Row #2 */REC =TRIM(DATA(OFFSET+17),9)*100REC =TRIM(DATAOFFSET+26),9)*100REC =TRIM(DATA(OFFSET+35,1)REC =TRIM(DATA(OFFSET+36,1)REC =TRIM(DATA(OFFSET+37,1)REC =TRIM(DATA(OFFSET+38),5) /*Row #1 */REC =TRIM(DATA(OFFSET+43,9)*100REC =TRIM(DATA(OFFSET+52,9)*100REC =TRIM(DATA(OFFSET+61,1)REC =TRIM(DATA(OFFSET+62,1)REC =TRIM(DATA(OFFSET+63,1)REC =TRIM(DATA(OFFSET+64,5) /*Row #6 */REC =TRIM(DATA(OFFSET+69,9)*100REC =TRIM(DATA(OFFSET+78,9)*100REC =TRIM(DATA(OFFSET+87,1)REC =TRIM(DATA(OFFSET+88,1)REC =TRIM(DATA(OFFSET+89,1)REC =TRIM(DATA(OFFSET+90,5) /Row #7 */REC =TRIM(DATA(OFFSET+95,9)*100REC =TRIM(DATA(OFFSET+104,9)*100REC =TRIM(DATA(OFFSET+113,1)REC =TRIM(DATA(OFFSET+114,1)REC =TRIM(DATA(OFFSET+115,1)REC =TRIM(DATA(OFFSET+116,5) /* Row #8 */REC =TRIM(DATA(OFFSET+121,9)*100REC =TRIM(DATA(OFFSET+130,9)*100REC =TRIM(DATA(OFFSET+139,1)REC =TRIM(DATA(OFFSET+140,1)REC =TRIM(DATA(OFFSET+141,1)REC =TRIM(DATA(OFFSET+142,5) /*Row #10 */REC =TRIM(DATA(OFFSET+147,9)*100REC =TRIM(DATA(OFFSET+156,9)*100REC =TRIM(DATA(OFFSET+165,1)REC =TRIM(DATA(OFFSET+166,1)REC =TRIM(DATA(OFFSET+167,1)REC =TRIM(DATA(OFFSET+168,5) /* Row #11 */REC =TRIM(DATA(OFFSET+173,9)*100REC =TRIM(DATA(OFFSET+182,9)*100REC =TRIM(DATA(OFFSET+191,1)REC =TRIM(DATA(OFFSET+192,1)REC =TRIM(DATA(OFFSET+193,1)REC =TRIM(DATA(OFFSET+194,5) /* Row #13 */REC =TRIM(DATA(OFFSET+199,9)*100REC =TRIM(DATA(OFFSET+208,9)*100REC =TRIM(DATA(OFFSET+217,1)REC =TRIM(DATA(OFFSET+218,1)REC =TRIM(DATA(OFFSET+219,1)REC =TRIM(DATA(OFFSET+220,5) /*Row #16 */REC =TRIM(DATA(OFFSET+225,9)*100REC =TRIM(DATA(OFFSET+234,9)*100REC =TRIM(DATA(OFFSET+243,1)REC =TRIM(DATA(OFFSET+244,1)REC =TRIM(DATA(OFFSET+245,1)REC =TRIM(DATA(OFFSET+246,5) /* Row #17 */REC =TRIM(DATA(OFFSET+251,9)*100REC =TRIM(DATA(OFFSET+260,9)*100REC =TRIM(DATA(OFFSET+269,1)REC =TRIM(DATA(OFFSET+270,1)REC =TRIM(DATA(OFFSET+271,1)REC =TRIM(DATA(OFFSET+272,5) /* Row #21 */REC =TRIM(DATA(OFFSET+277,9)*100REC =TRIM(DATA(OFFSET+286,9)*100REC =TRIM(DATA(OFFSET+295,1)REC =TRIM(DATA(OFFSET+296,1)REC =TRIM(DATA(OFFSET+297,1)REC =TRIM(DATA(OFFSET+298,5) /* Row #22 */REC =TRIM(DATA(OFFSET+303,9)*100REC =TRIM(DATA(OFFSET+312,9)*100REC =TRIM(DATA(OFFSET+321,1)REC =TRIM(DATA(OFFSET+322,1)REC =TRIM(DATA(OFFSET+323,1)REC =TRIM(DATA(OFFSET+324,5) /* Row #25 */REC =TRIM(DATA(OFFSET+329,9)*100REC =TRIM(DATA(OFFSET+338,9)*100REC =TRIM(DATA(OFFSET+347,1)REC =TRIM(DATA(OFFSET+348,1)REC =TRIM(DATA(OFFSET+349,1)REC =TRIM(DATA(OFFSET+350,5) /* Row #27 */REC =TRIM(DATA(OFFSET+355,9)*100REC =TRIM(DATA(OFFSET+364,9)*100REC =TRIM(DATA(OFFSET+373,1)REC =TRIM(DATA(OFFSET+374,1)REC =TRIM(DATA(OFFSET+375,1)REC =TRIM(DATA(OFFSET+376,5) /* Row #105 */REC =TRIM(DATA(OFFSET+381,9)*100REC =TRIM(DATA(OFFSET+390,9)*100REC =TRIM(DATA(OFFSET+399,1)REC =TRIM(DATA(OFFSET+400,1)REC =TRIM(DATA(OFFSET+401,1)REC =TRIM(DATA(OFFSET+402,5) /* Row #30 */REC =TRIM(DATA(OFFSET+407,9)*100REC =TRIM(DATA(OFFSET+416,9)*100REC =TRIM(DATA(OFFSET+425,1)REC =TRIM(DATA(OFFSET+426,1)REC =TRIM(DATA(OFFSET+427,1)REC =TRIM(DATA(OFFSET+428,5) /* Row #23 */REC =TRIM(DATA(OFFSET+433,9)*100REC =TRIM(DATA(OFFSET+442,9)*100REC =TRIM(DATA(OFFSET+451,1)REC =TRIM(DATA(OFFSET+452,1)REC =TRIM(DATA(OFFSET+453,1)REC =TRIM(DATA(OFFSET+454,5) /* Row #24 */REC =TRIM(DATA(OFFSET+459,9)*100REC =TRIM(DATA(OFFSET+468,9)*100REC =TRIM(DATA(OFFSET+477,1)REC =TRIM(DATA(OFFSET+478,1)REC =TRIM(DATA(OFFSET+479,1)REC =TRIM(DATA(OFFSET+480,5) /*Row #28 */REC =TRIM(DATA(OFFSET+485,9)*100REC =TRIM(DATA(OFFSET+494,9)*100REC =TRIM(DATA(OFFSET+503,1)REC =TRIM(DATA(OFFSET+504,1)REC =TRIM(DATA(OFFSET+505,1)REC =TRIM(DATA(OFFSET+506,5) /*Row #31 */REC =TRIM(DATA(OFFSET+511,9)*100REC =TRIM(DATA(OFFSET+520,9)*100REC =TRIM(DATA(OFFSET+529,1)REC =TRIM(DATA(OFFSET+530,1)REC =TRIM(DATA(OFFSET+531,1)WRITE REC TO DEDFILE,EMPLIDNEXT AENDENDENDWHILE DATA
REPEAT
CALL INDEX.FLUSH(
,
)RETURN
At 07 SEP 1998 10:34PM Steve Smith wrote:
Uncomment the OSOPEN line ahead of the OSBREAD, and assign your input filename to input.dos.filename as in the earlier example . Delete the SWAP statement at the start. Don't worry about assigning @DICT or @ID - they don't get used here, Remove the TRIM(…..) from any input fields where data is always present.
Remember to back up your files ahead of the import.
Steve.
At 07 SEP 1998 11:24PM Chip Wogu wrote:
«Uncomment the OSOPEN line ahead of the OSBREAD, and assign your «input filename to input.dos.filename as in the earlier example . «Delete the SWAP statement at the start. Don't worry about assigning «@DICT or @ID - they don't get used here, Remove the TRIM(…..) «from any input fields where data is always present.
I will remove the swap syntax and use the input.dos.filename as recommended also it looks like I may have to leave the TRIM(….) syntax because there is inconsistencies in the fields coming in. Some keys will be missing some columns, and I plan to reuse this code as a method of importing weekly changes only into the table. I have already performed my data and thank you for words of wisdom. Just for my own understanding and learning, is there an effciency derived from doing INPUT.DOS.FILENAME=H:\DEDCALC\VGENCDEDDAT.DAT over OSOPEN 'D:\DEDCALC\VGENCDED.DAT' TO TRANSACT ELSE.
As always tahnk you
Chip Wogu
CWOGU@MSN.COM
At 08 SEP 1998 04:06AM Steve Smith wrote:
The change you suggest is not that much more efficient. I was trying to flesh out your original example, which lacked the input filename.
Steve
At 08 SEP 1998 11:49PM Chip Wogu wrote:
Steve,
As you previously observed, I am getting .00 (zeros) when TRIM(DATA(OFFSET+17),9)*100 has no data. I am planning the code below to correct it. Do you if there is a more efficient way of doing this.
IF TRIM(DATA(OFFSET+17),9) THEN
REC =TRIM(DATA(OFFSET+17),9)*100
END ELSE
REC= '
END
Thanks
Chip Wogu
CWOGU@MSN.COM
At 09 SEP 1998 08:21AM Steve Smith wrote:
A=TRIM(DATA(OFFSET+17),9)
IF A THEN REC=(A*100)
Use the above two lines in place of the old single line. Where input data is null or zero then nothing is applied to the target file. Only non-null and non-zero data is applied. A is a temporary variable to save the TRIM and offset calculations occuring twice.
Steve
At 09 SEP 1998 11:14AM Chip Wogu wrote:
Steve,
Thanks for all the help!!!
It took me about 12 hours or less to load 7128 rows using your sample code and removing the corresponding indexes.
I will make the additional changes for the null and zero columns.
Once again thanks
Chip Wogu
CWOGU@MSN.COM
At 09 SEP 1998 05:31PM Chip Wogu wrote:
I tried testing how long it takes to load 100 rows of data and my test
show about 30 mins and less than 1 min if I have the Write commented.
Do you think I may have a problem with my Write statement and is there a better way to improve on it.
Thanks
Chip Wogu
CWOGU@MSN.COM
At 10 SEP 1998 03:13AM Steve Smith wrote:
The write sounds suspiciously slow.
Perhaps there are many indexes on this file, but if you've removed them then this would improve performance.
This would explain the slow performance.
Otherwise, a slow network link. If the target file is very large then this could also contribute to the speed issues.
I don't think the code can be optimized much more. Remove the INDEX.FLUSH(
,
) call as it updates the indexes after the import. If indexes are off INDEX.FLUSH will waste time.Steve
At 11 SEP 1998 12:27AM Victor Engel wrote:
PMFJI, but I have a few observations to make.
* What you are trying to do here is really pretty basic. I think it's great that Steve is spending the amount of time he has been with you. However, it is something that can be picked up quickly from the manuals. If you don't have manuals available, order a set. That will save you a lot of time.
* OSBREAD, OSREAD, OSBWRITE, and OSWRITE are quite fast. They will impact your speed minimally compared to successive writes to an Arev file.
* String manipulation is also very fast compared to I/O unless you have a very old CPU and a fast hard drive, not likely in today's world.
* Some specific comments about your program:
OSOPEN 'D:\DEDCALC\VGENCDED.DAT' TO TRANSACT ELSE
You have just created a handle to a DOS file. From now on, this handle should remain unaltered within the program. Use it for such commands as OSBREAD and OSBWRITE.
SWAP CHAR(13):CHAR(10) WITH @FM IN TRANSACT
Don't mess with the file handle.
REC.CNT=COUNT(TRANSACT,@FM) ;* number of records
How now brown cow? If you want to get a record count, try using the DIR function. Pass a file name, and it returns a dynamic array containing file size, date, and time. If you have a static record size, you can determine the number of records from the file size.
REC='
IDX=0
RECORDLENGTH=533*90
Just a comment on semantics here. It is not really the record size that is 533*90. You are actually defining a buffer size here. The record size as I read your program is 533. Whenever possible, name your variables, and use them in such a way that they convey the content of the variable at any point in the program.
LOOP
DATA='
Not necessary to initialize this variable.
*OSOPEN INPUT.DOS.FILENAME TO TRANSACT THEN
OSBREAD DATA FROM TRANSACT AT IDX LENGTH RECORDLENGTH THEN
IF LEN(DATA) THEN
IDX += RECORDLENGTH
B=(LEN(DATA)/533)-1
FOR A=0 TO B
OFFSET=A*533
EMPLID=DATAOFFSET+1,5
READ REC FROM DEDFILE,EMPLID ELSE REC='
REC=TRIM(DATA(OFFSET+12),5) /*Row #2 */
etc.
For this kind of code you would be better off to define a table and execute the assignments in a loop. It will allow for better maintainability and cut down on your code size. Here is a sample snippet that might result:
for field_no=1 to num_fields
offset=layout
length=layout
this_value=trim(dataoffset,length)
fmc=layout
rec=this_value
next
You get the idea….
WRITE REC TO DEDFILE,EMPLID
NEXT A
END /*if len(data) */
END /*osbread data */
END /* I assume this is left from the OSOPEN, in which case it should be commented out also */
WHILE DATA
You should move this WHILE DATA to a point right after the DATA is loaded. Otherwise all the lines from the OSBREAD to here will be performed on null data.
REPEAT
CALL INDEX.FLUSH(
,
)With a mass update, it is frequently more efficient to remove indexing before the batch job and reapply them at the end. If you don't want to mess with that, use BATCH.INDEXING to improve performance.
I have a REALLY hard time believing this process runs as slowly as you say. What is your setup? Are you running over a WAN? What is the network setup as well as the workstation setup? Is your destination file appropriately sized? Dump it and check the sizelock. It should be 0 unless you have a good reason for it not to be.
At 16 SEP 1998 07:56AM Ray Blackwell wrote:
Chip,
This type of transaction transfer is something that I do on an overnight basis via modem, file sizes can range from 1 to 10-15 megs, the actual read and disembowling of the OS file should be very, very quick. I will be quite happy to email you a program which you can adjust to suit your own purposes which should do the trick very effectively.
Kind Regards Ray
blackers@world-net.co.nz
At 20 SEP 1998 11:36PM Peter Lynch wrote:
I dont know why your OSBREAD should be slow - it probably is not the cause of your performance problem.
The writes to EMP.DEDUCTIONS are almost certainly the cause of your performance problem. Removing indices will help.
If many of the records are new to EMP.DEDUCTIONS, resizing the file
and sizelocking it before running may help.
Maybe there is a bug - the TRANSACT variable is incorrectly used in your sample code.
And if it is the OSBREAD which is causing the problem, you can read it in larger blocks using this technique. -
In the sample, the Variable TRANSACT is used as a file variable and
as a list of keys.
@DICT=OPEN.FILE('DICT.EMP.DEDUCTIONS')
SWAP CHAR(13): CHAR(10) WITH @FM IN TRANSACT
REC.CNT=COUNT(TRANSACT, @FM)
* number of records
REC=
IDX=0 @ID=
SPACE5= ' '
SPACE9= ' '
SPACE1= ' '
RECORDLENGTH=533
BLOCK.LENGTH=40 * RECORD.LENGTH
DATA.BLOCK='
LOOP
DATA.FOUND =FALSEGOSUB GET.NEXT.DATAWHILE DATA.FOUND
EMPLID=DATA1, 5WHILE EMPLID
REC=GET.DATA("EMP.DEDUCTIONS", '',EMPLID)
***DESCR2ROW1=DATA12, 5REC='IF ROW1 NE SPACE5 THENREC=ROW1ENDROW2=ICONV(DATA17, 9, 'MD2' )REC= ''IF ROW2 NE SPACE9 THENREC=ROW2ENDSAVE.DATA("EMP.DEDUCTIONS", '', EMPLID, REC)REPEAT
RETURN
GET.NEXT.DATA:
DATA.FOUND=TRUEIF DATA.BLOCK=' THENOSBREAD DATA.BLOCK FROM TRANSACT AT IDX LENGTH BLOCK.LENGTHIDX += BLOCK.LENGTHENDDATA=DATA.BLOCK1,RECORD.LENGTHDATA.BLOCK1,RECORD.LENGTH='IF DATA=' THENDATA.FOUND=FALSEENDRETURN
At 22 OCT 1998 10:26PM Chip Wogu wrote:
Victor,
Thanks for responding to my posting about a month ago and mentioning
the great help I recieved from Steve Smith. I really appreciate the assistance from everyone and I did save your comments between codes and I hope to use them for future reference.
Thanks
Chip Wogu
CWOGU@MSN.COM