Update indexes before query - Apply to Btree Extract? (OpenInsight 32-Bit)
At 22 APR 2009 03:02:55PM Bob Carten wrote:
We (RTI) are thinking that we could modify Btree.Extract so that observes the 'update indexes before query' environment setting.
The benefit would be that btree searches would reflect updated indexes.
The cost would be increased contention for indexing resources. I think this would be a good change for small systems (< 10 users ). Larger systems might have problems. Any feedback from the community?
At 22 APR 2009 05:53PM Matthew Crozier wrote:
We're already doing that, and haven't noticed any problems. We shell btree.extract with a routine which will update the index(es) before searching.
Cheers, M@
[img]http://www.vernonsystems.com/images/logo_main_ani.gif[/img]
At 22 APR 2009 06:24PM Martin Drenovac wrote:
Bob what's the real longer term solution.
When will I see the indexer as a service running alongside the UD4.x driver, so that all indexing "just" happens without this low level developer involvement.
Cheers
At 22 APR 2009 06:33PM dbakke@srpcs.com's Don Bakke wrote:
Bob,
I would suggest another environment flag such as "Update Indexes Before Btree.Extract" so that it can be easily enabled/disabled.
Alternatively, add another (optional) argument to Btree.Extract that will tell the underlying logic to flush the index or not.
dbakke@srpcs.com
At 22 APR 2009 06:56PM Barry Stevens wrote:
]]I would suggest another environment flag such as "Update Indexes Before Btree.Extract" so that it can be easily enabled/disabled.
I Agree
At 22 APR 2009 10:48PM Warren Auyong wrote:
Yes, an universal environment flag would be great. An additional optional argument on BTREE.EXTRACT that would override the environment setting would be a must.
There may be times when you would not want to update the indexes before calling BTREE.EXTRACT even when the environment flag is set.
At 22 APR 2009 11:25PM Bob Carten wrote:
Thanks for input. Will look at added flag.
]]What's the real longer term solution? …
We (RTI) agree that a separate service is needed.
I have sketched out some designs; our work with the socketserver and UD Heavy has provided experience with multithreading and queued transactions.
My current thinking is:
* Service is responsible for queing transactions, building and retrieving index rows. Not responsible for creating transactions ( oengine's scope) or storing them ( UD's scope )
* We should generate index transactions in the client oengine because symbolics need context ( the current set of attached tables, local time, @station, etc )
* No 0 record, rather message to transactional queue, perhaps MSMQ ( part of Microsoft XP and above ), Apache queuing service or other. Queue eliminates locking contention on 0 record, could que the primary record too, so primary write fails if index updates fail.
* Transactions are queued in order, updates only allowed from service, thus no locking needed on !file
* Because service controls all writes, it could control all reads too, safely serve from cache like READO.
…
That is about as far as I've gotten.
I hope to prototype it this year. I want to look at bitmapped indexes too. They complement rather than replace BTREE. I still love Relational indexes. I put them in their own table ( e.g. CUSTOMERS has a CUSTOMERS_REL table, calculated fields like @ans=Xlate('CUSTOMERS_REL', @ID, 'INVOICES','X' ) ). With no locking contention and no 64 limit they are very reliable.
At 23 APR 2009 01:42AM Glenn Groves wrote:
Just a note that on UniVerse the indexes (Btree only as far as I am aware) are updated as part of the WRITE or DELETE statement execution, i.e. there is no separate step or delay in the updating of indexes. This is my personal preference.
At 23 APR 2009 07:55AM John Bouley wrote:
I agree that adding a new environment flag and optional btree.extract parm is the way to go. Most of our code does a flush before. If the indexes do finally become a transaction to the UD then this whole option will become moot (I think).
Bob, what happens when an index is added? Will a new thread be spawned to generate transactions or does this still have to be handled at the client level?
John
At 23 APR 2009 08:53AM Martin Drenovac wrote:
Bob - excellent prototyping. I think it's great that you're approaching it from this tack of using reliable existing techniques to integrate into OI.
In terms of indexing, I think the single complaint that we experience is the rebuild time for indices and as I've stated in other postings the actual referential integrity of knowing that the indices have gone south themselves.
I've also previously posted and currently testing an intention to hybridise my indexing by using some server based SQL for the indices whilst maintaining the data on OI. And, with the accessability of ADO all of this is much easier to approach.
At 23 APR 2009 10:43AM Richard Hunt wrote:
In UniVerse, each index has the option to "DISABLE", "ENABLE" and "UPDATE"; otherwise each index is automatically updated during the "WRITE", "DELETE" and "CLEARFILE" commands.
The "DISABLE" command disables the automatic updating of the index. The "ENABLE" command enables the automatic updating of the index. And the "UPDATE" command will update a disabled index.
My experience is that UniVerse indexes seem to process much faster than OpenInsight indexes. It just seems that there is way too much "overhead" in the OpenInsight indexes, or maybe it is the UniVerse index file format that is the advantage.
Btree indexes definately has way too much "overhead" compared to simple indexes. I think that Btee indexes are way too overrated.
After working with indexes for quite a while (about 14 years with Revelation), I have noticed that simple indexing does not corrupt, simple indexing is very fast, and building a simple index does not take long at all.
With sorting taking almost a blink of an eye, simple indexes for fast selects, and READO caching for reporting, I just don't see the use for Btree Indexes. Too much indexing time for too little benefit. Even with tables having millions of rows and nine simple indexes.
At 23 APR 2009 09:38PM Glenn Groves wrote:
Richard, what is the difference between a simple index and a BTree index? I vaguely know what a BTree is, but not a simple index.
At 24 APR 2009 10:45AM Richard Hunt wrote:
All I am trying to do is to point out why a Btree update or rebuild would be much longer than a simple index update or rebuild. Also remember that this is my opinion.
First off, I would like to say that if you really do not need the Btee indexing and you could use another type of indexing, you might want to consider using the other type of indexing.
Btree…
A very hard and lenghty thing to explain. I will state my opinion briefly. I am going to relate it to the linear hashed filing system, and this is not an accurate way of explaining it.
Btree is not something that was originally designed by Revelation. It is kinda like a linear hashed file except it is not so efficient on expanding and contracting. Also it sorts the information within the index. It breaks up the actual information you need into chunks of data. What I mean is that the extract will have to read quite a few times to retrieve the information you want.
When the index is being built or rebuilt or just simply updated, it could take quite a few read / writes to complete just one update.
So all the time this index is maintaining the sort within the index is kinda wasted if you do your own sort after extracting from the Btree.
I think that Btree indexes within a linear hashed environment is kinda useless and causes way too much overhead. Since linear hased files already break up the information when needed to balance the file load, and we do not have a 64k barrier anymore, why have a Btree index do the same.
__
Simple indexes…
My idea of a simple index would be kinda like a xref index. Basically each row key in the index is the index value and the row is the list of indexed row keys that meet the index value criteria. Kind of like "saved lists".
This type of simple index would be quite efficient in rebuilding and updating. The update part would be just like a read from a normal folder.
The rebuilding of this type of index would be very simple. Since we have access to the V119 sort, and the V119 sort allows you to extract the whole sort string from the sorted information, you would use the extracted sort string to rebuild the index ( @ID:FM$:INDEX_VALUE:RM$ ) or create a list of @ID's until the INDEX_VALUE changes then write the list of @ID's and that index value is updated with one WRITE command.
An example of using this type of indexing would be for dates of invoices. If the user wants to see the invoice amounts for all the April 2009 invoices, then you would extract and merge all the index values with the criteria inclusive "04/01/09" "04/30/09". Then you would sort the information and report it.
At 24 APR 2009 12:58PM Bob Carten wrote:
Richard,
Thank you for these explanations.
They are helpful and match some of my thinking.
I think of Btrees as optimised for range or LIKE searches and traditional ( in REVG XREF, in Arev Relational ) indexes as optimised for exact or EQ searches. Bitmapped indexes ( thank you for that idea Sprezz) are almost a special case of Relational. You are correct, set operations on these indexes is very effective.
I think the physical implementation of these structures was compromised by the fact that all they had was the peer to peer LH filing system with a 64k length limit. I think it is now time to address these compromises. This is not a trivial task, but we have much better tools now. We are still in the discussion stage of this project.