A select that makes Novel Crawl - Revisited (AREV Specific)
At 27 OCT 1998 11:28:52AM Claude Mansutti wrote:
Hello All,
I'm re-posting my interesting problem with updates at the bottom og the page:
I wonder if someone can explain why for a particular select, a client of mine has such serious network affecting results.
The select is as follows
select the_table by (a symbolic stripping the first part of a multi-part key{indexed field}) by (second part of a multi-part key {not indexed}) with (the symbolic between two integer values) and with (a field {indexed} ne "A" and ne "B")
Using Novell Netware 3.12 with Arev NLM (Ver 1.12 I think) on a Pentium server with 64MB RAM.
The table has some 50,000 records.
On the network, running this select grinds the whole network to a slow and painful crawl. The select takes about 10 minutes to complete, then the network goes back to normal pace.
If I take the whole data folder and place it on my P200 at home, the select takes about 20 seconds to complete
I have tried the following things on the clients site:
Delete all records in the Lists table and compress
Delete all records in the Systemp table and compress
Installed a dedicate indexer.
Nothing seems to work.
Is it Arev or is it Novell that needs bolstering?
Updates since this posting
I tried checking to see if ARev re-calculated the symbolic during the select - it didn't.
I then removed the symbolic and then just used the first part (Indexed) of the multi-part key in the select - Result: Network Ground to a snails pace.
I took the database to our office and ran it on our 3.12 Novell Network - It also ran at a snails pace.
So - the select takes 20sec on a stand alone and between 5-10 minutes on Novell 3.12… Why Why Why??????
Any Suggestions would be gratefully accepted.
At 27 OCT 1998 03:14PM Serguei Poliakov wrote:
]The table has some 50,000 records.
]On the network, running this select grinds the whole network to a ]slow and painful crawl. The select takes about 10 minutes to ]complete, then the network goes back to normal pace.
]If I take the whole data folder and place it on my P200 at home, the ]select takes about 20 seconds to complete
For me it looks like most of time is spent not on actual select of
data, but on key sorting (you use BY in your select).
As you have many records and the key is not short, AREV have to swap
sort buffer to disk. If the disk used for temporary file by AREV is a
network drive it will take much more time than on your home computer
where your data actually is not written to disk but is help in memory
(disk cash).
Serguei
At 27 OCT 1998 03:19PM K Gilfilen wrote:
Here's my guess, though others with more knowledge in specific areas may differ on the details.
YOUR QUERY:
select the_table
by (a symbolic stripping the first part of a multi-part key{indexed field})
(I AM ASSUMING THE SYMBOLIC IS INDEXED, NOT JUST THE FIELD) BEING INDEXED, THIS SHOULD GO PRETTY QUICKLY, AND THE KEYS RESULTING FROM THE SEARCH WILL BE STORED IN A DOS FILE ON THE CLIENT MACHINE. THIS SHOULD BE THE LAST PART OF THE QUERY'S PROCESSING. ANYWAY, THE KEYS RETURNED FROM THIS SEARCH HAVE TO BE TRANSPORTED TO THE CLIENT MACHINE.
by (second part of a multi-part key {not indexed})
NOT BEING INDEXED, THE FILE SYSTEM HAS TO GO THROUGH ALL KEYS IN THE TABLE, CRUNCH THE SYMBOLIC FORMULA (OR IF IT'S NOT A SYMBOLIC, IT STILL HAS TO PARSE THE MULTIPART KEY), SORT/STORE THE RESULTING KEYS.
IT MAY OR MAY NOT MERGE THEM INTO THE FILE ALREADY STORED ON THE CLIENT. THIS SHOULD BE PRETTY FAST ON STAND-ALONE, BUT MIGHT BE PRETTY I/O INTENSIVE, PUTTING A LOAD ON THE NETWORK. THAT GOES FOR THE PART WHERE IT BRINGS THE KEYS TO THE CLIENT. DOES IT RUN THE SYMBOLIC ON THE CLIENT OF ON THE SERVER? YOU WOULD DO WELL TO INDEX THIS SYMOBLIC.
with (the symbolic between two integer values) and
IS THIS INDEXED? IF NOT, IT HAS TO BE CRUNCHED AFTER THE DATA IS READ FROM EACH RECORD (A SEPERATE READ OF EVERY RECORD IN THE FILE, FROM OTHER READS FROM OTHER PARTS OF THE QUERY), AND THIS MUST BE SORTED AND MERGED WITH THE NEXT VALUE WHICH BEING INDEXED WILL EXTRACT QUICKLY AND IN SORTED FORM…
with (a field {indexed} ne "A" and ne "B")
THEN THIS MERGED GROUP OF RECORD KEYS, WHICH RECORDS HAVE VALUES OF A CERTAIN FIELD NOT EQUAL TO "A" AND NOT EQUAL TO "B", HAS TO BE MERGED AND THEN PUT IN THE ORDER OF THE FIRST SORT.
BACK TO NORMAL CASE NOW: The non-indexed part of the query are processing intensive, and require a lot of I/O. If the processing occurs on the client machine, but the file exists on the server, big chunks of data come across the network after the individual records are read and dictionary is read to find out where the field in question is. If this is all occurring on a stand-alone, your CPU bus is the weak link, and probably does not come close to reaching its capacity. If it is occurring on a network, your backbone and NLMs are the weak links, with boatloads of data traveling in packets or frames, competing with all of the other packets/frames from other clients. Very likely you fully tax the network's capacity with your process alone. The parts of the query involving merges of files (files created during query processing) on the client machine should go pretty quickly. I don't know if any of the files created reside on the server.
But I think this is a data schema issue, not network transport. Data design should (there's that word) be independent of the infrastructure, but if you push the design envelope you can hit the barrier with a schema that does not complement the application layer of the network. It may be worth your time to study how AREV processes different types of queries and adjust your schema/queries accordingly.
At 27 OCT 1998 07:55PM Steve Smith wrote:
Claude,
Some suggestions, FWIW,
(a) try attaching a local copy of LISTS to
reduce network traffic during I/O - this may
help a bit if the process is network-constrained.
(b) I think (not sure) there's a way
(DOS environment setting) or SYSPROG
setting to make AREV's sort path local.
Try this to reduce network traffic.
© create a single symbolic which
resolves to your sortable key, and
BTREE index this. This might reduce
the select complexity and incumbent
traffic overhead.
Hope this helps - good luck.
Steve
At 28 OCT 1998 04:16AM Curt Putnam wrote:
The more complex the SELECT, the more unpredictably AREV behaves. I don't have your answer, but do have the following suggestion.
Do the outer select 1st (cuts the total population of records that have to be dealt with by some - hopefully large - fraction.) Then the next most restrictive as another discrete select, then the next and so on.
Do the sort last.
Might help
At 28 OCT 1998 07:59AM Claude Mansutti wrote:
Thank you all for your responses.
I'll get to work and try your various suggestions and will post the results
Claude
At 28 OCT 1998 07:39PM Warren wrote:
Along with some of the other suggestions try 'field GT "B"' vs 'field NE "A" and field NE "B"'
At 29 OCT 1998 06:44PM Claude Mansutti wrote:
Thanks everyone for your suggestions.
I spent some time grinding our in house network to a crawl determining where the backlog and delays occurred.
I found the main culprit.
It was along the lines of Warrens suggestion:
If I had Status (NE "A" and NE "B") it seems like the select ignored the indexes and went and read each record to see what its status was. As 47,000 record had a status of A or B, this clogged the arteries.
by changing the select to with status (eq "O" or eq "" or eq "B" etc) the select changed from a 7-10 minute grinder to a 30 sec minimal network traffic cutie.
Again thanks to you all for your suggestions
P.S. Curt - how do you do a discrete select?