Question
· Jun 15, 2017

Ways and methods of Accessing Relationships with SQL

I am experimenting with Relationships, both Parent to Child and One to Many.

I have done some SQL look-ups and have searched through the documentation, however not in a lot detail, but wonder if there are more and better ways to access both sides of Relationships through SQL?

Thank you in advance for any help provided.

Discussion (25)2
Log in or sign up to continue

For relationships, are they defined in the classes (parent/child)? Something like:

In the parent table:

Relationship MyChild As ABCCompany.Data.Parents.Kids [ Cardinality = children, Inverse = ParentsInfo ];

and in the child table:

Relationship ParentsInfo As ABCCompany.Data.Parents [ Cardinality = parent, Inverse = MyChild ];

Then, just select from whichever table you need, and join on the relationship field. I typically select from the child table and use the arrow syntax ( ->) to get the parent data for each entry.

I am not a fan of parent/children relationships. Think about what this really means. Kyle's comment about the data being nested beneath the parent object in the global is just the default physical storage model. The user has control over storage and can change that structure. But consider the logical behavior that PARENT cardinality brings. First, a relationship is like a foreign key - it is constrained by a referential constraint and it is subject to referential actions. Since this is a foreign key referencing the IDKEY of the parent class there is no need to define an ONUPDATE action (IDKEY values cannot be changed) but ONDELETE must be CASCADE. Why? Because PARENT cardinality establishes an EXISTENCE dependency on the parent. You can't change that unless you change the cardinality to ONE. Secondly, the IDKEY of the child class implicitly includes the parent relationship. This establishes an IDENTIFICATION dependency on the child class. You can't identify (the ID/OID) an instance of the child class unless you know the ID of the parent class. Furthermore, once a parent value is assigned and saved, you cannot change the value of the parent property. To do so would identify a different object and it isn't allowed. So - PARENT defines existence and identification dependencies.

To summarize PARENT/CHILDREN behavior:

1. Existence dependency;

2. Identification dependency;

3. Physical storage default is nested (artificially creating an implicit ONDELETE CASCADE behavior at the physical level - IMO this is not a good thing!)

 

And think about other ways to achieve the same thing using ONE/MANY or even a simple foreign key:

1. Existence dependency - make the relationship in the child class (or property if using fkey) REQUIRED;

2. Identification dependency - define the IDKEY and add the relationship with cardinality = ONE to the IDKEY;

3. Physical storage model nested - I don't recommend this model for ONE/MANY. 

The advantages of ONE/MANY:

1. The user has control over dependencies;

2. More ONDELETE/ONUPDATE options;

3. The option to base the IDKEY on a positive integer value, allowing the use of bitmap indices;

What does a relationship do for you that a foreign key does not? One thing. It maintains the relationship in memory when either side of the relationship is in memory (swizzled). This can be a good thing. It can also be a bad thing if there are a large number of related objects in the n-cardinality side.

Hi Kyle

Thanks for your excellent comment

I agree - sort of....

Bit it's a balance (as always) between loading buffers - it may be the case that there is an occasional need to just grab the dates - but if it's only a 10% (say) need whereas the 90% need is to display/process the header and the lines together then, for me, the 90% need should win out.

Also if the dates (or whatever) are indexed then a selection on a date range (say) will only select the required rows from the index.

= =

But as I said before - it depends on the size of the system - my clients have modest needs (maybe 3 million rows max) and with a 64Gb machine all/most of the blocks are in memory anyway smiley

 

But thanks for the thoughts - I will certainly start looking at one-many with a cascade delete

Peter

In that case, if you EVER want to query the headers (say, you want information on the dates, which would exist in the header) then you will always be loading the children into memory.  This is clearly inefficient.  
 

You can enforce each Line having a Header by make the property required, and you can use Foreign Keys to cascade deletes as well.  The only time to use Parent-Child is if you will NEVER query the parents without the children (or the tables are to be so small as the performance hit you take doesn't matter).

Hi Peter!!

Your comment is correct of course. But think about the behavior and not the physical model. Everything that PARENT/CHILDREN relationships offers can be easily replicated using ONE/MANY with the only differences being the default storage model and the composite IDKEY that is required by PARENT/CHILDREN. With ONE/MANY you have the advantage of using system assigned integer ID's (frees you up to use bitmap indices - think product/invoice line perhaps) and you have more flexibility with referential actions. PARENT/CHILDREN requires ONDELETE=CASCADE. With ONE/MANY you have the option for ONDELETE=NOACTION/CASCADE... And, of course, if you with to establish existence dependency between the parent class and the child then simply add a REQUIRED constraint to the relationship whose cardinality = ONE.

Hope to see you at Global Summit!

 

Dan

Peter, please clarify your comments:

>>>>That for any reasonable sized database (ie less than <~10,000,000) rows on a modern server with adequate memory - it >>>>don't much matter!!! Cache will do the biz.

What do you mean Cache will do the biz?

 

>>>>>But give it a go
>>>>>As I said it don't much matter unless you are in serious terabyte country

Give it a go? Give what a go?

And what is serious terabyte country?

 

I am sorry, but I am just not following what you are saying.

I'm not following you on this part:

Having real children with real parents then the parent-child relationship is incorrect on many levels - because as I said before "a parent can die" and you don't want the children records to disappear!!!!

Are you talking real life parents and children, or data? If data, in a parent/child relationship, the child record is dependent on the parent, and does not exist on it's own. If you're wanting to keep the orphaned records when the parent record is deleted, then a parent/child relationship may not be the appropriate setup. Then again, you could disable cascading deletes, at least with other db's, and I assume with Cache.

This is a very interesting discussion. I do get the point that while our code looks like looping through a set of fields, behind the scenes the entire physical data block containing each item is pulled into memory. And if our items are inside parent objects in a p / c relationship, then the children are located in the same block and this is quite inefficient if and when we are just interested in the parents. Definitely something I hadn't properyly considered before.

But as Peter points out, mostly we are interested in looking at / processing both together, that's why they are modelled as P / C in the first place. And the marketing spiel of Intersystems for the last few decades was that the "clustering effect" achieved by hiearchical global tree structures was what made Globals so efficient, as data that is commonly used together is stored physically together.

So my personal take-away from this is not that p / c relationships are obsolete but that I need to consider more carefully the use cases for data access, in particular how likely it is that the top level information only will be required, before committing to it.

Hi Wolf

Long time no meet !!!

This is such an interesting conversation...

And it all depends on on the actuality - if you have a parent with a sqillion children - then the answer to p/c performance is not so good

Also if you have a business object where the many side are constantly added to - eg a patients lab results- again different - leads to block splitting and a bunch of pointer blocks having to be loaded into memory

So...

my business case argument is with an Invoice header and invoice lines then parent/child v one/many is the most efficient - they (really never change) so no index bock splitting

But I do take Dan's and Otto's comment about bitmaps only doing integer ID's

= =

Tell you a story..

around 12 tears ago I was involved with a client with a 1.2Tb database - I was critisised by the in-house staff for IT staff for using naked global references in one module - blab blah - but it was 2% faster than full global references and  in a tight loop 2%  meant 30 mins off the processing time

= =

Having said that - I *will* be trying out one-many with a cascade delete

Peter

@Wolf Koelling, I have been researching this because we encountered <Store> errors when processing data stored within a parent child relationship. <Store> errors occurred because of child data being loaded into memory. This breached default memory allocated to a cache process of 262,144 Kbytes (262 MB).

We initially increased cache process memory using $ZSTORAGE to get rid of <Store> errors and kept doing this till we had increased it to 2048,000 Kbytes (2 GB)! We will be refactoring our code to use one-many relationship instead, as soon as possible.

My take away from this is that it's better to stay away from parent-child relationships and use one-many relationships by default.

You don't need to do it this way. Instead, get the id of the parent object and then, as you create each child object,

do childobj. <ParentPropertyName>SetObjectId(parentId)

and save the child object. That should solve your memory problems.

This is a technique that doesn't just apply to relationships but to any object property where you can avoid swizzling it into memory.

Caché Objects persistent classes project to SQL as tables. Caché Objects relationships are binary - a relationship must have a defined inverse relationship in the related class. We only support relationship/inverse relationship cardinality pairs of one:many and parent:children (a special case of one:many where the parent is implicitly required and is also implicitly part of the IDKEY). A Caché Objects n-cardinality relationship (children, many) is transient in the container and there is no projection to SQL. A 1-cardinality relationship is stored and projects to SQL as a column that is constrained by a foreign key constraint. A simple query can be used to retrieve the contents of the unprojected n-cardinality relationship. Consider two tables, A and B, related by properties b and a. A.b is of type B and specifies cardinality = many.  B.a is of type A and specifies cardinality = one.

The value of A.b can be determined by this query:

select %ID from B where B.a = ?

and specify '?' as the A.%ID value. In fact, the internal relationship implementation uses a very similar query to populate A.b on %Open.

Hi Dan, Otto and Kyle

I disagree !!!

If you think about real world examples - eg invoice header and invoice lines

for me it makes no sense for invoice lines to exist without a header - I would want the db to enforce this as it does with parent child relationship

Agreed that the global buffers will have to load more data - but then again it's most likely that you want to see the lines whenever you see the header,

But I do agree with Otto re bitmap indices - but it has to be some huge database for this to impact on performance and if you are in this sort of size that it makes a diff then it's a different problem entirely and requires some deep thought re the design

Peter

FAO Mike Kiddow

All of us Cache gurus are getting carried away with the internals and very deep design concepts that rely on an in-depth knowledge of how Cache works.....

I guess you do not fully follow what we are discussing - (sorry not being dismissive at all) - it's just that me, Wolf, Dan Otto, Kyle are just being super-dooper Cache egg heads.

Bottom line is...

That for any reasonable sized database (ie less than <~10,000,000) rows on a modern server with adequate memory - it don't much matter!!! Cache will do the biz.

= =
But please note that the example from Scott is not correct...
Having real children with real parents then the parent-child relationship is incorrect on many levels - because as I said before "a parent can die" and you don't want the children records to disappear!!!!

In case I would have (as a first pass) a class "people" and another class "people relationships" which links different instances of people with each other

= =

oo methodology seems intuitively easy to understand - but it's not
and on top of that is the pros and cons of Cache performance considerations that we have been discussing

But give it a go
As I said it don't much matter unless you are in serious terabyte country

Peter

Wolf

Thinking about this the first point is do the children get added to or amended - if it's a lot (eg with case notes) then that will lead to mega block splitting as the data size for the whole record grows

OTOH - if it's an invoice that (essentially) never changes then it's fixed and p/c is cools

==  ==
Apart from the bit map indices only working for integer ID's - do if you really have squillions of products that you need to process then one-many must be the business solution so it gives the performance of

   select sum(qtySold) from invoiceLines where ProductID=123

Peter

FAO Scott

Hi Scott

Sorry it's taken me so long to get back to you re real world parents and children

It's the example you gave of Cache relationships - and I was a bit quick fire with my answer - sorry!. 

Please let me expand.....

Real world parents and children is an interesting problem and a simple solution that you described is *not* the way I would model it!!

this is regardless of the actual implementation - eg parent/child or foreign keys etc etc

= =

If i wanted to model family trees I would have a class "Person" and another class "PersonRelationship" - the second of these would have links to two instances in the "Person" table

Something like (in relationships rather than foreign keys - but that's implementation rather than the oo design)

PS - I am typing on the fly - so there may be errors!!!!

= =

Class Person as %Persistent

Relationship rRelatedPerson as PersonRelationship [cardinality="many", inverseproperty="rLink"]

property Name as %string;

property pDoB as %Date;

....etc

And then PersonRelationship as %Persistent

Relationship1 as Person[cardinality="one", inverseproperty rRelatedPerson]

Relationship2 as Person[cardinality="one", inverseproperty rRelatedPerson]

property RelationshipType as SomeLookupTable;

....etc

The SomeLookupTable would describe (in words) the relationship eg "Son Of" and "Farther Of"

For me this has some beauty
You can...

  • Construct a family tree of infinite depth both forwards and backwards
  • Use recursive SQL programming to construct the tree
  • a "child" can have multiple links - eg "gene father" and "step father"
  • The record is complete - eg a woman might have a "gene father" then a "step father" and then go back and be linked to the same "gene father" (life happens)
  • It can also model surrogate parents via AI fathers or surrogate mothers or same sex relationships
  • It can be extended to, say, pets

Some care has to be taken in the database insert/amend eg

  • avoid recursive relationships eg a Person is her own grandmother is not physically possible
  • a person cannot have the same mother and father (well with AI and embryo manipulation tailoring this *may* become a reality - but no problemo the model will still work)

= =

Hope this is clear and of interest - if you need any more info please ask

= =

PS - I was involved around 30 years ago in computing an "In Breeding Coefficient" for rare breeds (think small populations of endangered species in zoos) the aim was to give a metric on how inbred an individual was - small populations where is was common for both grandparents to be the same individual - the logic was intense to get a metric - you could have the case where the same individual was all of the both grandfathers and all 4 great grandfathers- not so good for preserving the gene pool !

Peter

FAO Mike

Hi Mike

You ask some very good questions!

It's been a few (10+) years since I did some metrics on Cache performance - working on it now (when I have time)

I will publish the results here

But my initial findings are much what I said - given enough memory (8Gb allocated to global buffers) it don't much matter - so a 200Mb global (10,000,000 rows with links to 2 other tables) I can't see any significant diff in the performance between parent/child or foreign key - or bitmap re normal indicies.

going to try it with 50,000,000 and limit the amount of memory allocated to global buffers

Watch this  space

Peter