btree index on a relational target index field ??!! (AREV Specific) [Revelation On-Line Wiki]

Sign up on the Revelation Software website to have access to the most current content, and to be able to ask questions and get answers from the Revelation community

At 03 JUN 1999 06:03:57PM Peter Fisera, Synergy Computer Consulting wrote:

AREV Specific

We are in a situation where there is a one-to-many relationship between file X (1) and file Y (many). There is an AREV relational index from Y to X, so that a field in X contains all the keys of related records in Y. If we need to quickly find all records in X that have NO corresponding Y records, there is no easy way to do it, aside from a slow select on an unindexed field (X is a very large file).

In a sudden brainstorm, I tried creating a symbolic that is true if the target field in X is empty, and false otherwise, and then putting a Btree index on this symbolic. As I suspected, it isn't quite that easy. For some reason (probably due to the inner workings of SI.MFS) the automatic update to the target field does NOT update any indexes based on that target field. The symbolic evaluates correctly, but the btree index on it does not update as I had hoped, and rapidly becomes incorrect (only a full rebuild will cause it to take note of changes in the target field, and this is obviously not an option).

Any indexing experts out there know of a way to make this work, or of some other approach we haven't thought of?

thanks,

Peter Fisera

Synergy Computer Consulting

At 03 JUN 1999 06:34PM Matt Sorrell wrote:

Peter,

Perhaps a trigger, oops MFS in AREV terms, or a nightly process that checks to see if values have changed?

Or perhaps instead of a symbolic, have a data field in table X that is update when a corresponding entry is made into table Y (shudder at the thought, I know). This way the index would be updated automatically. This last idea could be implemented through an MFS on table Y so that you would not have to recode anything that writes to table Y.

Just my 2 cents worth (from an obviously not experienced index type person)

Matt Sorrell

[email protected]

At 04 JUN 1999 01:42AM Curt Putnam wrote:

Peter,

Over the course of years, I've learned not to do relational indexes. They don't do anything that cannot be done with a Btree and take much longer to rebuild when corrupt. In your case, I'd suggest putting a Btree on the source relational field (instead of a relational) and using a symbolic to do a Btree.Extract in the master file. An empty Btree.Extract is false, otherwise you have the list. Btree's are fast enough - espeically in the light of the problems you are facing.

At 04 JUN 1999 03:31PM Peter Fisera wrote:

How can I do a BTREE.EXTRACT when I don't know which values I'm searching for? The keys in file Y (the 'many' side of the one-to-many) are the key from file X plus a 2nd key part. However, even if you indexed these, you would need to find keys that are in file X and did NOT appear in the BTREE.EXTRACT of file Y… so you would have to evaluate every key in X, defeating the whole point.

At 04 JUN 1999 03:37PM Peter Fisera wrote:

Actually Matt, for your 'inexperienced' response, that was right on the money. We did consider doing that already, and (as long as the MFS was correctly positioned in relation to SI.MFS) it probably could be made to work. The biggest problem would be an efficiency hit on one of our main data files, by adding another MFS to it (and just the act of telling whether a record has changed requires a 2nd read for each write to that file). That is still an option, but the reports we are trying to optimize are not important enough to warrant that much overhead (but the overhead of adding one more BTREE, if that had worked, would be a lot less in comparison and therefore worthy of consideration). Thanks for your input.

Peter Fisera

Synergy Computer Consulting

At 04 JUN 1999 04:44PM Victor Engel wrote:

Instead of putting an index on a symbolic field, have you tried putting a btree index on the related field? This way you can simply SELECT filename WITH fieldname "". I don't know if the relational update process would defeat the btree update or not, but it's worth trying out.

At 04 JUN 1999 06:02PM Peter Fisera wrote:

The reason we didn't do that earlier was just to save space (I don't actually want to index all the keys in the target field, only the ABSENSE of keys!) but your logic made sense, so I tried it. Unfortunately the same problem occurs: the btree index does not update reliably (even with background indexing and/or forced updates) when keys are added or removed to the target field. It may be the problem I guessed at earlier: when SI.MFS writes to the target field, that write does NOT cause any further calls to SI.MFS for the target file itself. But thanks for trying.

Peter Fisera

Synergy Computer Consulting

At 04 JUN 1999 06:14PM Victor Engel wrote:

You might be able to do something by genning the code and modifying it. You can do this by creating a blank record named !! in the file being indexed and deleting the ! record. When the ! record is regenerated, the source to it is put in !!.

At 04 JUN 1999 08:28PM Peter Fisera wrote:

I just tried looking at some index transaction code, and yes, I think it could be done. In the transaction code for File Y, the record from File X would have to be locked, read, and written, somewhere just before or after the call to RELATER (writing to some 'other' field in File X, that is a flag as to whether the actual target field has values or not, and this field would then have a btree index on it). This is extra overhead (but not as much as calling another MFS) but also a problem if the File X record happens to be locked by a different station. (This SHOULD not happen, with this particular application, but it could in some obscure circumstances… relational indexing will make the update later if the target is locked, but this additional process couldn't, without some data structure to store the update). The index transaction code for File Y would have to wait for the File X record to become unlocked, or the flag field would become corrupt.

I guess another approach would be to have the btree index on the symbolic based on a target field in File X (as I did before) but have the transaction code for File Y also create a transaction for File X, if the value of the symbolic is changing. This might be harder to code, but would end up being more efficient as a lock and write to the File X record would not be necessary (only a read).

Also, there is the more obvious problem that if indexes are added to File Y, the transaction code gets overwritten, so there would have to be precautions so that the custom transaction code is in a separate subroutine and a call to it always added back to the ! record if the indexes are changed.

If we could get around these problems, we might reach our objective with very little performance hit. The coding and testing time, on the other hand, may be substantial. I will discuss it with the client to see if they think it is worth it, for the sake of a few reports.

thanks for your help!

Peter Fisera

Synergy Computer Consulting

At 06 JUN 1999 10:51PM [email protected] - [url=http://www.sprezzatura.com]Sprezzatura Group[/url] wrote:

Completely off the top of my head since I've lost the whole topic here, but why not have the relational update off a symbolic then have the symbolic calculate whatever it is needs to be done? I know the relational update will come after the symbolic executes, but you could post a transaction or something to another batch processor.

[email protected]

Sprezzatura Group

At 07 JUN 1999 12:35PM Maggie wrote:

If you're entertaining wild ideas, here's one (and no, I've never tried this–it's just a brainstorm):

in a test environment of course

-Create a YY file (one-to-one relationship to the Y file). It has one field: the target field for the relationship to the X file.

-Relate the YY file to the X file on the same target field.

-Modify the entry form to update the target field in YY instead of in Y.

-Modify the index transaction code in YY so it posts an update to the btree on the symbolic field in the X file.

-Rebuild the necessary indexes.

-Test the query.

-Modify records in Y (YY target field).

-Test the query again.

If this actually works, then, assuming you never use YY for any other reason, you don't have to worry about recompiled index transaction code that might obliterate the custom code you've added to update the btree in the X file.

Maggie

At 08 JUN 1999 05:37AM Tony Marler @ Prosolve wrote:

I find X and Y table confusing much prefer CUSTOMER and INVOICE in examples ! but I think you need to put in either X*keysfield or Y*sourcefield in position 21 of the dictionary field of the symbolic record i.e "EDIT DICT.filename symbolic"

If it works you should see the notion 'depends on' in LISTINDEX at TCL.

See how that goes.

Tony

At 08 JUN 1999 03:05PM Peter Fisera wrote:

That might actually work! It looks like a bit of processing overhead, due to the extra reads/writes to the YY file, but hey, no indexing-based solution is ever totally free. I will discuss this one with the client and see if they want to try it.

At 08 JUN 1999 05:10PM Peter Fisera wrote:

That 'dependency' function for indexes is interesting, and I had no idea that even existed! Any documentation out there available on it?

However, I tried it for this problem and while it maybe works for other situations, it doesn't work here. This may still be to do with the fact that the field it is dependent on is a relational index (I tried setting the dependency to both source and target field, and got the same result). If you add new records to either File X or File Y the index updates as it should, but if you delete the last File Y record for a given X record (making the relational target field in X null) the index does NOT update correctly. This is similar to what was happening even before the dependencies were set. I suspect it may be a problem in reading OREC after the last target key is removed rather than before, and thus not seeing it as a change that demands an index update. This is just my guess, it's hard to tell without the SI.MFS source code…

So far, it still looks like modifying the index transaction code to produce artificial transactions for the Btree index may be the only solution.

Peter Fisera

At 09 JUN 1999 05:48PM Kgilfilen wrote:

Not sure if I'm missing something that makes this more complex…

First, we never had success with relationals and did not have time for the learning curve using trial and error to figure them out.

Second, indexing on symbolics defeats the purposes of using database systems…

What if you build an index on the field in the parent file that contains record IDs from the child file. You could then build a batch process that does a "select by" of that record key field. The records with empty fields will either be at the front or back of the list, depending on how you do the select. Then you readnext through them, doing whatever you do with records that have that field empty, until you hit records that have something in that field. It won't be lightning fast, but you're dealing with flat files here, not relational database systems.

Remember that if the field in question has been "grouped" anywhere it will have value marks, and won't look empty in a field-wise search, Your "else" code will have to check for value marks, and if they are present, search through the values for record IDs.

Kenny

At 09 JUN 1999 06:02PM Kgilfilen wrote:

Or is the field in the parent file created by the relational index? If so, what are the rules for the index? Do the child records have a field pointing to parent records, denoting which one they belong to? What if you did a select of IDs of the parent file (list A),and a select of the foreign key field of the child file (list B) and a savelist, respectively of each, and did a reverse merge of some kind, to determine which keys from the A list are not in B?

In which case you'd need an index on the key field of the parent file, which you probably already have, and an index on the foreign key field of the child file, which you already have.

Or am I still missing the point?

At 13 JUN 1999 12:01PM Maggie wrote:

Just to put us on the same page, the child file has the target field in it that is the foreign key to the parent file. When you set up a relational index, the parent file will have a field in it that points to the child file (child_keys). The relational indexing process keeps this data up-to-date (notwithstanding the kinds of problems that might cause index corruption). The data itself is stored in the parent file making it just about the quickest way to select related records.

Typically, you would select all records in the Parent file that have nothing in the child_keys field in order to get your list of records that have no 'children.'

The additional factor here is that they want this list to be created QUICKLY. The select mentioned above, although typical, will not be QUICK on a very large parent file. Therefore, all the discussion. It is an interesting poser.

For all the different ways you might approach the problem, they all have a certain amount of processing overhead. In my view, the fastest way to select records in the Parent file would be to first use data that is stored in the parent file (i.e., child_keys), then if you were bound and determined to use an index, then apply a btree index on that data if possible. If that's not possible or advisable, then index a symbolic field if the frequency of its use justifies the work required to build and maintain it.

Personally, I hate indexes on symbolic fields because they require custom indexing code. Now, if there were a way to have the custom code isolated in such a way that it could be preserved if the indexing on the file in question were modified, then I wouldn't hate indexes on symbolics nearly as much. My "wild idea" in a previous posting was to suggest an attempt to isolate this custom code by applying it to a file that had nothing but the target field in it.

In view of this wild idea, the other idea mentioned (putting a btree index on the target field in the child file) has as much applicability as using a traditional relational index. The advantage is that at the point of entry, the btree index is much faster than the relational index (to save and modify records). However, resolving the btree (when selecting records from the Parent file) has a lot more overhead.

In the case of the "wild idea" I proposed (using the YY file), you really could use a btree index instead of a relational index because what matters to the solution at hand is the symbolic field in the Parent file. The custom code applied to the YY file will keep that symbolic field index up to date (given standard index flushing), and reduce overall processing time at the point of entry. In other words, you save time saving records; you save time doing the select for parent records without children. And, in the long run, you save time should you ever need to rebuild an index (and that relational index on the Parent file has got to be a bear to rebuild if they think selecting on the child_key field takes too much time).

The kicker is that you pay for it in runtime processing, typcially for XLATEs from the child file to the Parent file (each one requires a btree.extract call, and after so many calls, you begin to wonder if indexing was such a bright idea in the first place).

At 14 JUN 1999 02:53PM K Gilfilen wrote:

Maggie,

Thanks for the explanation of the problem. Since I had always avoided relationals, the language of relational users was not part of my vocabulary, and I probably did not fully understand the situation. So the field in the parent fie is generated by the index.

Implicit in avoiding relational indexing is the fact that that relationship still has to exist and be maintained somehow. We build it into our presave processes because, at least for my part, I always felt I could put in better error-handling than you find in the stuff that came out of the box. If the link was not created somehow in the presave, the save would cancel, and the user would be asked to try again. At least if that failed, someone would know. As it is with the current indexing system, many times you don't know the indexes are bad until someone in purchasing notices that their report is missing a few lines.

It also ensured that the data in the "index" my presave process was maintaining was persistent, which is not always the case with ARev's indexes.

However, this is not a trivial modification, and probably not useful in the scope of this discussion.

Kenny

At 14 JUN 1999 03:59PM Maggie wrote:

Well, Since your relationals are stored in the same way, you might appreciate how easy it is to check the integrity of the relationship between parent-child files. Although it's not a routine practice for our application, when suspicions are aroused, there's a pretty painless (if slow) way to check the integrity; then it's pretty painless re-establishing the correct relationships without doing a rebuild.

The gist being that you have to evaluate every child record and every parent record against the data in the parent records' "child_key" field: every child record's parent record has to have that child record key in the CHILD_KEY field; then every child record pointed to by the parent record must have the parent key in the target field. I use two symbolics, one in each file, to do this check.

The hard way to fix: add or delete parent key values in the child file to either get rid of incorrect child_key values, or to add ones that somehow never got added.

The easy way to fix: write a program that posts updates. I personally find this to be the hard way–easy only on the users.

On really large files, on networked systems where users are wont to reboot or engage in other disastrous behaviour, sometimes this kind of fix is the only expedient.

At 15 JUN 1999 07:34PM K Gilfilen wrote:

Maggie,

On various occasions I've had to do just what you described; have a program that loops through the parent file emptying that field, and then looping through the child file, writing record keys to the field in the appropriate record in the parent file. I also kept a list of orphan child records (no related parent) to archive at the end of the process. The users could put the process into their scheduler whenever they wanted, or just kick it off manually. In situations where we did the updates in a presave process with good error-handling the error rates of the batch process above dropped to zero and stayed there unless there were network problems, power outages or the data replication system sent rogue updates.

But in my frustration, the only index system I would use with ARev was the Btree indexes. Nothing else was close to reliable enough or fast enough, though some people used the cross references which were okay. We had to design our data models around the indexing scheme, which is at best dysfunctional.

Now I am in telecommunications doing C++ programming. With that you have your choice of commercial libraries that are highly refined and well documented. You could probably implement an index scheme for ARev using commercial libraries and C++ calls in a relatively short time. Of course that would be re-inventing the wheel…

Kenny

At 16 JUN 1999 05:10PM Maggie wrote:

Kenny,

Well, your program that loops through the file, essentially refreshing or "rebuilding" the relationship is exactly what rebuilding a relational index does with the one exception that with a standard rebuild, there are no "orphans"–parent records get created for them. Users call them "ghost" records or "skeletal" records.

We should probably knock it off. We're getting pretty far from the point of discussion.

Back to Y2k now…
Maggie

View this thread on the forum...