Collection vs parent child vs Reference vs One to many

I've been wondering about some code that I have come across a lot over the years.

Let's assume I have class Cinema and class Film.
Conceptually the data in these classes are never really physically deleted but only flagged as such, due to business requirements.

What I find is that developers tend to create a 3rd class "CinemaFilms", in a child relationship to parent Cinema, with a reference to Film. Rather than a one-to-many between Cinema and Film. 

Class CinemaFilms
{

Relationship cinema As Cinema [ Cardinality = parent, Inverse = Films];
Property film As Film;
}

The only reason I can think of, why we do this, is to avoid classes becoming a spaghetti mess of relationships(knowing it will break referential integrity).

Would it make more sense to use Collections here(array of/list of)?

Class Cinema
{
Property films As Array Of Film;
}

But I do not see any real benefit of Collection properties over parent-child relationships whatsoever, especially if Film contained 'lots'(think millions here) of data. In practice it just makes querying etc more difficult.

Would it make more sense to just use a 'stand alone' class with foreign key?

Class CinemaFilm
{
Property cinema As Cinema;
Property film As Film;
Index idxa ON (cinema,Film) [ Unique ];
}

Breaks referential integrity but we are free to index it as needed.
 By now of course I have forgotten what my actual question was, but would be something along the lines of what is the most simple and volume efficient method?

Answers

Nicki

 

An Array of Objects is basically the same as a Parent Child relationship storing the child data in the same global as the parent.  

When people create the third class CinemaFilms they are do that to create a Many to many relationship between Cinema and Films.  One Cinema shows many Films and one Film will be shown in many Cinemas.

 

For performance reasons we suggest using Foreign Keys instead of relationships.  You can still setup the Parent Child behavior with Cascading Delete.

 

I don't see why using a Foreign Key would break Referential Integrity.

 

brendan

 

 

Can you elaborate on this: "For performance reasons we suggest using Foreign Keys instead of relationships."

My understanding is that Relationships will create the foreign keys (and the relationship table) for you similar as you would do in the relational world:

TableA

AID

TableB

BID

 

TableABRelationship

AID

BID

FK: AID

FK: BID

 

Sorry for not answering this sooner.  In Cache / InterSystems IRIS when you define a Parent Child relationship between 2 classes the Child class has an IDKey made up of 2 values:  a reference back to the Parent and a unique identifier.  Because the IDKey is a compound key you can not define a Bitmap index in the Child Ccass.  That is drawback #1.   Bitmap indices are not the be all / end all, but they can give you GREAT performance so it is nice to keep this option on the table.

The Parent Child relationship Storage is the second drawback, while it is not required, and it might not always be a drawback, the classic storage for a Parent Child relationship is to store all the data in the Parent Global. 

Now to explain this I want to make sure everyone knows what I am talking about when I say Global.  InterSystems stores persistent data in a sparse tree structure that we call Globals.   Subscripts are always stored in ASCII Collated order, so if I did it right the way I show the globals here is the same way they would be stored.  A simple persistent class will store the data in a global like this:  ^Package.ClassD(id), where id in a unique integer.  So 4 rows in this table could look like:

      ^Package.ClassD(1)=$LISTBUILD("data 1")

      ^Package.ClassD(2)=$LISTBUILD("data 2)

      ^Package.ClassD(3)=$LISTBUILD("data 3")

      ^Package.ClassD(4)=$LISTBUILD("data 4")

If we define a Child table in this class then we will store that child data in a lower level subscript

    ^Package.ClassD(1)=$LISTBUILD("data 1")

          ^Package.ClassD(1,"Child",1)=$LISTBUILD("Child 1")

        ^Package.ClassD(1,"Child",2)=$LISTBUILD("Child 2")

      ^Package.ClassD(2)=$LISTBUILD("data 2)

        ^Package.ClassD(12"Child",4)=$LISTBUILD("Child 4")

      ^Package.ClassD(3)=$LISTBUILD("data 3")

      ^Package.ClassD(4)=$LISTBUILD("data 4")

        ^Package.ClassD(4,"Child",3)=$LISTBUILD("Child 3")

Now add a second Parent Child relations where the child class is the parent so we have 3 subscript levels

    ^Package.ClassD(1)=$LISTBUILD("data 1")

         ^Package.ClassD(1,"Child",1)=$LISTBUILD("Child 1")

              ^Package.ClassD(1,"Child",1, "Grand",3)=$LISTBUILD("Grand 3")

        ^Package.ClassD(1,"Child",2)=$LISTBUILD("Child 2")

              ^Package.ClassD(1,"Child",2, "Grand",1)=$LISTBUILD("Grand 1")

              ^Package.ClassD(1,"Child",2, "Grand",2)=$LISTBUILD("Grand 2")

      ^Package.ClassD(2)=$LISTBUILD("data 2)

        ^Package.ClassD(12"Child",4)=$LISTBUILD("Child 4")

      ^Package.ClassD(3)=$LISTBUILD("data 3")

      ^Package.ClassD(4)=$LISTBUILD("data 4")

        ^Package.ClassD(4,"Child",3)=$LISTBUILD("Child 3")

It should be easy to imagine this global becoming very large storing all the different children at all the different levels.  This increases the cost of getting the data for 1 class into memory because we need to load all the above data into memory just to get the 4 rows from the top level parent table.  This increased IO is the second issue with Parent Child Relationships.

When you define a Foreign Key between 2 classes the storage structure is not changed.  Both classes by default will have a positive Integer as the IDKey so they will both support Bitmaps.  They will each have their own D global so you will only need to load the data for the "Child" table when it is needed.  

If you go with the Foreign Key instead of the Parent Child make sure you define the FKey to use Cascade on Delete and you need to define an index in the "Child" class on the reference to the "Parent" class.

OK so that was longer than I thought it would be.  Hope it makes sense.  If you have more questions i will try to answer them faster this time.

The 3 options yield the "same" results, however, depending on needs, one requires more work on your part than the other. With Relationships, for example, Cache automatically handles all referential integrity for you (foreign keys are automatically generated for you). Whereas the other options don't provide that and you're required (depending on your needs) to code  the referential checks (e.g. declare foreign keys).