Question
· Oct 26, 2017

Differences between INNER JOIN and ->

Greetings to all!!! Suppose there is a table Mother (ID, Name) and Child (ID, Name, Mother), Mother in the table Childis a relationship. Let's say the task is to deduce the names of all the children whose their moms' names start with the letter 'A', I can do this in two ways in sql, and I can not understand the difference, the pros and cons that when to use:

1) SELECT Child.Name FROM Mother INNER JOIN Child ON Mother.ID = Child.Mother WHERE Mother.Name LIKE 'A%'
2) SELECT Child.Name FROM Child WHERE Child.Mother->Name LIKE 'A%'

Discussion (1)0
Log in or sign up to continue

This is a good question - it causes a bit of confusion for those starting out with arrow syntax.  When you use -> you are doing what we call an "implicit join".  If you look at the docs here:

http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=...

You'll see that an implicit join is a LEFT OUTER JOIN.  This is very different from an INNER JOIN as it will return rows from the first table whether or not the ON clause is satisfied.  Now, if your queries were:

1) SELECT Child.Name FROM Mother LEFT OUTER JOIN Child ON Mother.ID = Child.Mother WHERE Mother.Name LIKE 'A%'
2) SELECT Child.Name FROM Child WHERE Child.Mother->Name LIKE 'A%'

Then these queries would be exactly the same, and you should use the one that you find easier to read and work with.

2 additional notes:

1) LIKE kind of sucks when used in this way, because the optimizer isn't given the parameter.  Therefore it code generates assuming the parameter could be '%'.  This leads to poor performance.  If you are looking for the first letters in a string, you should use %STARTSWITH like so:

SELECT Child.Name FROM Mother INNER JOIN Child ON Mother.ID = Child.Mother WHERE Mother.Name %STARTSWITH('A%')

That will allow a ranged read of the Name index (which I assume you have) instead of a full scan.

2) If you set these up as a parent-child relationship in your class definition, you should change it to one-to-many or to use foreign keys.  The reason for this has to do with the storage.  Without going into too much detail, if you have a parent-child relationship they are stored on disk together, with each parent very close to its children.  This causes performance problems if you want to look at just the parent information alone, as you'll have to load in the child information, even if you don't need it.