Acceptable use of btree.extract (OpenInsight 32-bit) [Revelation On-Line Wiki]

OpenInsight 32-bit, Mark Boorman, Donald Bakke, Andrew McAuley, bob carten, D Harmacek, Joshua Goddard, Barry Stevens

Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 28 JAN 2022 01:16:25AM Mark Boorman wrote:

Whilst I'm sure there's no definitive answer, as a general guide is there a set of conditions whereby btree.extract (or rlist for that matter) can be expected to be time consuming and/or unreliable?

Currently facing performance issues with a web application and whilst I feel there are varying issues at play I've found some requests that are effectively stalling everything because of a btree.extract taking an excessive amount of time, somewhere in the vicinity of twenty minutes.

What we're ending up with is ten oengines running and sending the cpu to max out continuously. Note, when everything is humming along, the cpu only intermittently spikes to maybe 20 or 30% and hums around 1 or 2%.

Then on occasion, like yesterday, it maxes out and never recovers.

Through searching logs I stumbled over the 20 minute btree.extracts. Not a lot of them but enough for me to consider them to be a contributing factor.

The search criteria searches on two fields.

One a date range, the other a list of customers. In these problematic ones, the list of customers to search for exceeds 1,000.

To be clear, I don't promote that scenario and have often shared my concerns about the impracticality of it but it's obviously falling on deaf ears. What I'd like to know is, is there a number of "what would be reasonable to expect reasonable results" to point them to.

Eg :"How about limiting it to 100 customers or 20 customers" so I can provide a little more concrete input.

Or should I expect 1,000 values to work and the fact that it takes 20 minutes is indicative of some other issue?

At 28 JAN 2022 01:24AM Donald Bakke wrote:

Whilst I'm sure there's no definitive answer, as a general guide is there a set of conditions whereby btree.extract (or rlist for that matter) can be expected to be time consuming and/or unreliable?

Currently facing performance issues with a web application and whilst I feel there are varying issues at play I've found some requests that are effectively stalling everything because of a btree.extract taking an excessive amount of time, somewhere in the vicinity of twenty minutes.

What we're ending up with is ten oengines running and sending the cpu to max out continuously. Note, when everything is humming along, the cpu only intermittently spikes to maybe 20 or 30% and hums around 1 or 2%.

Then on occasion, like yesterday, it maxes out and never recovers.

Through searching logs I stumbled over the 20 minute btree.extracts. Not a lot of them but enough for me to consider them to be a contributing factor.

The search criteria searches on two fields.

One a date range, the other a list of customers. In these problematic ones, the list of customers to search for exceeds 1,000.

To be clear, I don't promote that scenario and have often shared my concerns about the impracticality of it but it's obviously falling on deaf ears. What I'd like to know is, is there a number of "what would be reasonable to expect reasonable results" to point them to.

Eg :"How about limiting it to 100 customers or 20 customers" so I can provide a little more concrete input.

Or should I expect 1,000 values to work and the fact that it takes 20 minutes is indicative of some other issue?

Mark - 1,000 values returned in and of itself should not be a problem. You can go magnitudes higher and it should not be a problem. I strongly suspect there is some other issue.

That is not to say that Btree.Extract might not be a good fit for your search, but without knowing the way your search criteria is setup and without knowing anything about your table, dictionary, etc., I can only speculate.

Don Bakke

SRP Computer Solutions, Inc.

At 28 JAN 2022 04:07AM Andrew McAuley wrote:

That sort of speed is appalling. We routinely work with databases with 10s of millions of rows and would be horrified by that performance.

I'm not a betting man but my guess is that instead of using BETWEEN on date ranges you're using >= ⇐.

The former goes into the index at the start point and exits at the end point. The latter grabs all rows >= X, then grabs all rows ⇐ Y, then combines the lists.

The Sprezzatura Group

The Sprezzatura Blog

World leaders in all things RevSoft

At 28 JAN 2022 04:57AM Mark Boorman wrote:

Not sure I've worded this in a manner that is clear.

Table is approx 1 million rows. With 1,000 values I'm not referring to returned keys but rather values to search for.

Search criteria looks something like this
SearchCriteria	= 'CUSTOMER':@vm:ClientSearch:@fm

	SearchCriteria	:= 'ROSTER_DATE':@vm:Start - 1:'~':enddate + 1:@fm
It's the ClientSearch variable I'm referring to that has in excess of 1,000 possible values.

I'm also not entirely sure that this is indicative of usual performance. In the desktop app, (which is the primary purpose), this is not a scenario that would ever occur so I don't have any similar scenario to compare to. This is just a symptom I found was occuring during the time we were having issues. I don't know if it's the cause or another symptom.

At 28 JAN 2022 08:21AM bob carten wrote:

Hi Mark,

Your query should not take that long.

Some things to check:

1. Could it be rebuilding the indexes?

In the database manager, is it set to update indexes before Btree.Extract?

If so, is there a backlog of index values, in particular, is there a ALL.IDS entry in the ! file 0 record ( or 1 or 2 or 3 …)

2. Are you running a 9.46 with all patches? If not, try upgrading with all patches ( easy for me to say …)

3. Could it be running the old version of IDX.SETS?

Does the record CFG_IDX_SETS exist in SYSENV? If so, Does it say IDX_SETS2? else you might be running the old version. The old version (IDX_SETS_1) would roll out >32k chunks to sort files, then do a sort merge. It could hit a pathological condition which made it extremely slow. You would see piles of .sfx or .rollout files.

3. Is the OI client installed on the machine hosting the webserver?

At 28 JAN 2022 02:47PM Donald Bakke wrote:

Not sure I've worded this in a manner that is clear.

Table is approx 1 million rows. With 1,000 values I'm not referring to returned keys but rather values to search for.

Search criteria looks something like this
SearchCriteria	= 'CUSTOMER':@vm:ClientSearch:@fm

	SearchCriteria	:= 'ROSTER_DATE':@vm:Start - 1:'~':enddate + 1:@fm
It's the ClientSearch variable I'm referring to that has in excess of 1,000 possible values.

I'm also not entirely sure that this is indicative of usual performance. In the desktop app, (which is the primary purpose), this is not a scenario that would ever occur so I don't have any similar scenario to compare to. This is just a symptom I found was occuring during the time we were having issues. I don't know if it's the cause or another symptom.
I can't say for certain whether I've performed a search for that many values. I gather that if you reduced the number of values but otherwise kept the search criteria the same it performs much faster?

Also, what happens if you perform the 1,000 value search but exclude the ROSTER_DATE part of the search? I'm just wondering how much isolation testing you've done with this to determine where the problem might actually lie (and all of this assumes you've gone through Bob Carten's checklist).

Don Bakke

SRP Computer Solutions, Inc.

At 28 JAN 2022 02:56PM D Harmacek wrote:

Please pretend that you have only 3 client keys in ClientSearch instead of 1,000.

Now, give use exactly what that string of 3 keys looks like. My example CleintSearch = '123':@vm: '345': @vm: '654'

I am suspicious of this part.

Dave Harmacek - Harmacek Database Systems - near Boston, MA USA

At 28 JAN 2022 08:17PM Mark Boorman wrote:

1. Rebuilding indexes? No. That would make the system permanently unuseable for everyone, web and desktop users alike. Update before btree.extract has been unchecked for a number of years for that reason. Variations of this request are executed consistently throughout the day, I'd expect far more than any other. Most complete within milliseconds or at worst a second or two. They would also only have a handful of clients to search on. Maybe ten or twenty. Anything up to 100 I wouldn't bat an eyelid at.

The 0 record has just a 0 in it.

2. Running 9.40 it seems. As you sayd, "easy for you to say". :biggrin: They're a 24/7 business and planning to move on within the next six months so we may or may not go through the upgrade process.

3. IDX_SETS2

4. (or 3 part 2) Pretty sure it was but I've just rerun it to be absolutely positive.

At 28 JAN 2022 08:30PM Mark Boorman wrote:

The search happens all day with a reduced number of values and it goes unnoticed. I only started looking because the cpu was maxing out and grinding to a halt. We log all requests including benchmarks which is how I found the anomaly.

The requests always have the roster_date and (clients or employees).

So no, I haven't as yet performed any isolation tests, I just searched logs and reviewed the ones that stood out.

At 28 JAN 2022 08:40PM Mark Boorman wrote:

Hey Dave,

Yes, that's exactly how it looks and works all day long. The list itself is predefined for each user.

It's just that in some cases the people setting the user up have decided that it makes sense to give a user in excess of 1,000 locations to manage.

Which means it's the same code, just a longer list.

At 29 JAN 2022 06:58AM D Harmacek wrote:

It's all about getting a list of keys, isn't it?

If a user can have from 1 to some 1,000 locations there must be separate rows, one for each user-location pair. I'd call that the LOCATION table. The USER would be a column in that table, and the key, unique to each user-location. Do you have a btree on USER of that table? Then a call on that USER index of the location table returns a list of location keys, in milliseconds.

Dave Harmacek - Harmacek Database Systems - near Boston, MA USA

At 30 JAN 2022 06:43AM Joshua Goddard wrote:

I once tired to do what you are doing, and it was extremely slow. I found another way to do it, and maybe it will work for you too.

Call this function:

Collect.IXVals(table, index_column_name)

That will return a list all of the unique values for the index instantly.

Now you can search on that list for all of your customers (using the loop remove method). If you have 1000 customers, you'd have to do 1000 searches. Not sure how fast that would be, but you should try it.

If it's too slow, then just convert the list returned by Collect.IXVals into a hash table (SRP's one). Then you just need to do 1000 lookups of the hash table.

If you don't want to use SRP's hash table, then put the list returned by Collect.IXVals into a matrix, and then for each customer, use binary search to look for it in the matrix.

At 30 JAN 2022 06:54AM Joshua Goddard wrote:

If you don't want to use Collect.IXVals and want to keep using btree.extract, thee is a way to turn your list of 1000 customers into a list of just 2 customers.

Use the range operator (~) and search for all customers between the biggest and smallest customer. Then you will need to search through the list returned by bytee.extrat, as it will contain some false positives.

At 30 JAN 2022 07:13AM Joshua Goddard wrote:

Actually, I just realized that Collect.IXVals won't work for you, as it returns the values, not the keys. In my case, the keys and values were the same.

At 30 JAN 2022 08:12PM Mark Boorman wrote:

It is about getting a list of keys but not a list of location keys.

The table in question is a table of workshifts with a five part key.

employee*customer*date*start*finish

What we're looking for is all of the shifts for say a fortnight. Users of the desktop app, who enter/delete/edit the shifts only ever work on one site/customer at a time. What they've done via the web is open up the access to these rosters/shifts to their clients. Each client may have any number of sites/customers throughout the country but our client, the user of the desktop has multiple clients.

The end client is the web user and they are limited to their customers/sites only. That's where the 1,000 comes in. It's an mv field in a record, ready to hand.

However we're extracting on the roster table for all shifts within the fortnight for all of their sites/customers and none of anybody elses.

Does that make it any clearer or more confusing. I can't tell anymore.

At 30 JAN 2022 08:23PM Mark Boorman wrote:

Hey Josh,

As you've since figured out, collect.ixvals is not the function.

As for the range operator, it works for the dates but there's no such range for customers. They're not necessarily sequential but worse than that, they're varchars and the users have been quite creative in their creation of customer keys.

Recommended best practices have long been thrown out the window. I've lost count of the number of times I've had to tell them that the request they're making doesn't work not because the app is broken but because the url includes an unexpected ampersand that somebody somewhere thought would be good in a customer key.

Again.

At 31 JAN 2022 03:48PM Barry Stevens wrote:

Hey Josh,

As you've since figured out, collect.ixvals is not the function.

As for the range operator, it works for the dates but there's no such range for customers. They're not necessarily sequential but worse than that, they're varchars and the users have been quite creative in their creation of customer keys.

Recommended best practices have long been thrown out the window. I've lost count of the number of times I've had to tell them that the request they're making doesn't work not because the app is broken but because the url includes an unexpected ampersand that somebody somewhere thought would be good in a customer key.

Again.

@MarkBoorman

I just ran this test and 1000 keys were returned instantly

subroutine test_mark_btree(Void)

CustomerKeys=For x = 1 To 1000 CustomerKeys<0,x>=x If rem(x,4) Else CustomerKeys<0,x>=x:"&":x:"@":@upper_case end Next x Open "WORKSHIFTS" To hFile Then For x = 1 To 1000 Key=1:"*":CustomerKeys<0,x>:"*":55435:"*":1000:"*":2000 Write To hFile,Key
Next x
End

call update_index("WORKSHIFTS")

search_criteria = 'CUSTOMER':@vm:CustomerKeys:@fm

search_criteria := 'ROSTER_DATE':@vm:55435 - 1:'~':55435 + 1:@fm

table='WORKSHIFTS'

Open "DICT",Table To @dict Else debug

keylist = ""

option = ""

flag = ""

call Btree.Extract(search_criteria, table, @DICT, keylist, option, flag)

debug

return

At 31 JAN 2022 04:19PM Barry Stevens wrote:

Hey Josh,

As you've since figured out, collect.ixvals is not the function.

As for the range operator, it works for the dates but there's no such range for customers. They're not necessarily sequential but worse than that, they're varchars and the users have been quite creative in their creation of customer keys.

Recommended best practices have long been thrown out the window. I've lost count of the number of times I've had to tell them that the request they're making doesn't work not because the app is broken but because the url includes an unexpected ampersand that somebody somewhere thought would be good in a customer key.

Again.

@MarkBoorman

I just ran this test and 1000 keys were returned instantly

subroutine test_mark_btree(Void)

CustomerKeys=For x = 1 To 1000 CustomerKeys<0,x>=x If rem(x,4) Else CustomerKeys<0,x>=x:"&":x:"@":@upper_case end Next x Open "WORKSHIFTS" To hFile Then For x = 1 To 1000 Key=1:"*":CustomerKeys<0,x>:"*":55435:"*":1000:"*":2000 Write To hFile,Key
Next x
End

call update_index("WORKSHIFTS")

search_criteria = 'CUSTOMER':@vm:CustomerKeys:@fm

search_criteria := 'ROSTER_DATE':@vm:55435 - 1:'~':55435 + 1:@fm

table='WORKSHIFTS'

Open "DICT",Table To @dict Else debug

keylist = ""

option = ""

flag = ""

call Btree.Extract(search_criteria, table, @DICT, keylist, option, flag)

debug

return
More testing shows that the issue is with the combination with the ROSTER_DATE.

I will get back with a solution which I think will be rlist select. (select_into?)

At 31 JAN 2022 04:43PM Barry Stevens wrote:

Hey Josh,

As you've since figured out, collect.ixvals is not the function.

As for the range operator, it works for the dates but there's no such range for customers. They're not necessarily sequential but worse than that, they're varchars and the users have been quite creative in their creation of customer keys.

Recommended best practices have long been thrown out the window. I've lost count of the number of times I've had to tell them that the request they're making doesn't work not because the app is broken but because the url includes an unexpected ampersand that somebody somewhere thought would be good in a customer key.

Again.

@MarkBoorman

I just ran this test and 1000 keys were returned instantly

subroutine test_mark_btree(Void)

CustomerKeys=For x = 1 To 1000 CustomerKeys<0,x>=x If rem(x,4) Else CustomerKeys<0,x>=x:"&":x:"@":@upper_case end Next x Open "WORKSHIFTS" To hFile Then For x = 1 To 1000 Key=1:"*":CustomerKeys<0,x>:"*":55435:"*":1000:"*":2000 Write To hFile,Key
Next x
End

call update_index("WORKSHIFTS")

search_criteria = 'CUSTOMER':@vm:CustomerKeys:@fm

search_criteria := 'ROSTER_DATE':@vm:55435 - 1:'~':55435 + 1:@fm

table='WORKSHIFTS'

Open "DICT",Table To @dict Else debug

keylist = ""

option = ""

flag = ""

call Btree.Extract(search_criteria, table, @DICT, keylist, option, flag)

debug

return
Sorry, bum steer, added more customer keys out of the range and now just searching on customer keys is taking ages.

At 31 JAN 2022 04:53PM Barry Stevens wrote:

Hey Josh,

As you've since figured out, collect.ixvals is not the function.

As for the range operator, it works for the dates but there's no such range for customers. They're not necessarily sequential but worse than that, they're varchars and the users have been quite creative in their creation of customer keys.

Recommended best practices have long been thrown out the window. I've lost count of the number of times I've had to tell them that the request they're making doesn't work not because the app is broken but because the url includes an unexpected ampersand that somebody somewhere thought would be good in a customer key.

Again.

@MarkBoorman

I just ran this test and 1000 keys were returned instantly

subroutine test_mark_btree(Void)

CustomerKeys=For x = 1 To 1000 CustomerKeys<0,x>=x If rem(x,4) Else CustomerKeys<0,x>=x:"&":x:"@":@upper_case end Next x Open "WORKSHIFTS" To hFile Then For x = 1 To 1000 Key=1:"*":CustomerKeys<0,x>:"*":55435:"*":1000:"*":2000 Write To hFile,Key
Next x
End

call update_index("WORKSHIFTS")

search_criteria = 'CUSTOMER':@vm:CustomerKeys:@fm

search_criteria := 'ROSTER_DATE':@vm:55435 - 1:'~':55435 + 1:@fm

table='WORKSHIFTS'

Open "DICT",Table To @dict Else debug

keylist = ""

option = ""

flag = ""

call Btree.Extract(search_criteria, table, @DICT, keylist, option, flag)

debug

return
Hi, me again.

I can confirm it is MUCH faster in OI10.1 (not instant, but acceptably faster)

At 31 JAN 2022 05:21PM Barry Stevens wrote:

Sorry.

FYI

Timing test for btree.extract:

OI9.4 6 mins 1000 customer keys, 101000 keys in keylist

OI10.1 46 seconds 1000 customer keys, 101000 keys in keylist

This is the ssp run on both systems:

subroutine test_mark_btree(Void)



Declare Function select_into



CustomerKeys=''

CustomerKeysA=''

For x = 1 To 1000

	CustomerKeys<0,x>=x

	If rem(x,4) Else

		CustomerKeys<0,x>=x:"&":x:"@":@upper_case

	end

Next x



For x = 1001 To 60000

	CustomerKeysA<0,x>=x

	If rem(x,4) Else

		CustomerKeysA<0,x>=x:"&":x:"@":@upper_case

	end

Next x

Open "WORKSHIFTS" To hFile Then

	For x = 1 To 1000

		For y = 1 To 200

			Key=1:"*":CustomerKeys<0,x>:"*":55435+y:"*":1000:"*":2000

			Write '' To hFile,Key

		Next y

	Next x

	

	

End



call update_index("WORKSHIFTS")



debug

<-----***timing started here



search_criteria	= 'CUSTOMER':@vm:CustomerKeys:@fm

search_criteria	:= 'ROSTER_DATE':@vm:55435 - 1:'~':55535 + 1:@fm

table='WORKSHIFTS'

Open "DICT",Table To @dict Else debug

keylist = ""

option = ""

flag = ""

call Btree.Extract(search_criteria, table, @DICT, keylist, option, flag)



cnt=dcount(keylist,@vm)



debug



return

I promise no more :biggrin:

At 31 JAN 2022 06:07PM Donald Bakke wrote:

Sorry.

FYI

Timing test for btree.extract:

OI9.4 6 mins 1000 customer keys, 101000 keys in keylist

OI10.1 46 seconds 1000 customer keys, 101000 keys in keylist

This is the ssp run on both systems:

subroutine test_mark_btree(Void)



Declare Function select_into



CustomerKeys=''

CustomerKeysA=''

For x = 1 To 1000

	CustomerKeys<0,x>=x

	If rem(x,4) Else

		CustomerKeys<0,x>=x:"&":x:"@":@upper_case

	end

Next x



For x = 1001 To 60000

	CustomerKeysA<0,x>=x

	If rem(x,4) Else

		CustomerKeysA<0,x>=x:"&":x:"@":@upper_case

	end

Next x

Open "WORKSHIFTS" To hFile Then

	For x = 1 To 1000

		For y = 1 To 200

			Key=1:"*":CustomerKeys<0,x>:"*":55435+y:"*":1000:"*":2000

			Write '' To hFile,Key

		Next y

	Next x

	

	

End



call update_index("WORKSHIFTS")



debug

<-----***timing started here



search_criteria	= 'CUSTOMER':@vm:CustomerKeys:@fm

search_criteria	:= 'ROSTER_DATE':@vm:55435 - 1:'~':55535 + 1:@fm

table='WORKSHIFTS'

Open "DICT",Table To @dict Else debug

keylist = ""

option = ""

flag = ""

call Btree.Extract(search_criteria, table, @DICT, keylist, option, flag)



cnt=dcount(keylist,@vm)



debug



return

I promise no more :biggrin:

Barry - Are you using Mark's actual data or did you create your own table with the same column names and populated with sample data?

Don Bakke

SRP Computer Solutions, Inc.

At 31 JAN 2022 06:18PM Barry Stevens wrote:

Sorry.

FYI

Timing test for btree.extract:

OI9.4 6 mins 1000 customer keys, 101000 keys in keylist

OI10.1 46 seconds 1000 customer keys, 101000 keys in keylist

This is the ssp run on both systems:
subroutine test_mark_btree(Void)



Declare Function select_into



CustomerKeys=''

CustomerKeysA=''

For x = 1 To 1000

	CustomerKeys<0,x>=x

	If rem(x,4) Else

		CustomerKeys<0,x>=x:"&":x:"@":@upper_case

	end

Next x



For x = 1001 To 60000

	CustomerKeysA<0,x>=x

	If rem(x,4) Else

		CustomerKeysA<0,x>=x:"&":x:"@":@upper_case

	end

Next x

Open "WORKSHIFTS" To hFile Then

	For x = 1 To 1000

		For y = 1 To 200

			Key=1:"*":CustomerKeys<0,x>:"*":55435+y:"*":1000:"*":2000

			Write '' To hFile,Key

		Next y

	Next x

	

	

End



call update_index("WORKSHIFTS")



debug

<-----***timing started here



search_criteria	= 'CUSTOMER':@vm:CustomerKeys:@fm

search_criteria	:= 'ROSTER_DATE':@vm:55435 - 1:'~':55535 + 1:@fm

table='WORKSHIFTS'

Open "DICT",Table To @dict Else debug

keylist = ""

option = ""

flag = ""

call Btree.Extract(search_criteria, table, @DICT, keylist, option, flag)



cnt=dcount(keylist,@vm)



debug



return
I promise no more :biggrin:
Barry - Are you using Mark's actual data or did you create your own table with the same column names and populated with sample data?

Don Bakke

SRP Computer Solutions, Inc.
did you create your own table with the same column names and populated with sample data?

YES

At 31 JAN 2022 07:38PM Mark Boorman wrote:

Hey Barry,

Thanks for the efforts. I see your starting to see differences when you widen the parameters a little.

I've run a few tests overnight and this morning, one of which was removing the roster_date portion and extracting based on customer alone.

No significant difference.

All extracts ranged from 19-23 minutes and all were performed on my personal laptop.

What did make a difference?

Limiting the client list to just the first ten entries of the 1,000 plus given. Then the extract took 376ms.

At 31 JAN 2022 07:40PM Barry Stevens wrote:

Hey Barry,

Thanks for the efforts. I see your starting to see differences when you widen the parameters a little.

I've run a few tests overnight and this morning, one of which was removing the roster_date portion and extracting based on customer alone.

No significant difference.

All extracts ranged from 19-23 minutes and all were performed on my personal laptop.

What did make a difference?

Limiting the client list to just the first ten entries of the 1,000 plus given. Then the extract took 376ms.

Yep, same observations.

At 01 FEB 2022 08:24AM Andrew McAuley wrote:

For the people who are specifying 1,000 values, how many would be returned if they omitted the 1,000? Just thinking - select without regards to these values and filter the result? As it's part of the key it'll be quick.

The Sprezzatura Group

The Sprezzatura Blog

World leaders in all things RevSoft

View this thread on the Works forum...