Replies by Brendan Bannon for InterSystems Developer Community

Brendan Bannon · Oct 30, 2019

Sergey

The biggest difference between the 2 queries is the second one is really a JOIN, not a Left Outer Join. By adding the second WHERE condition on a property in pcu you are making this a JOIN and allowing the Optimizer to start in either table. I created the tables on my system and here is what I am seeing

The first query starts in ps using the item index and then joins to the pcu table using a temp file built from the master map (same as the plan you show)

The second query is starting with the pcu table and the doing a join to the ps table using a temp file build from the Item index and the master map.

Why don't we use the Item index in pcu? I can't tell you exactly why, but I know the 2 biggest factors we are looking at are the size of the maps on disk and the Selectivity of the properties involved. In both cases, we are going to need to read the master map so the question is this: Will reading in the Index map reduce the number of blocks we need to read of the master map to make it worth the effort? Without proper values generated by TuneTable, or supplied by you, the optimizer is left to guess at what these values are. In the first case, it has decided that reading the index will not reduce the number of block reads of the master map so it is not using it. This generally happens when we think the Master Map is very small.

To take this any further I would need you to run TuneTable on both of these tables and see if the poor plan is still used. If it is still used please provide the storage info for both classes so I can investigate this.

Brendan Bannon · Oct 30, 2019

Hi Kurt

Sorry about that. I have corrected the issue with downloading studio.

Please let us know if you run into anything else.

Brendan Bannon · Oct 30, 2019

Could you please include the query that is getting this error?

Brendan Bannon · Aug 15, 2019

I agree with Danny. I would want to see the classes and the Show Plan for the 2 queries.

Did you define any indices in the classes and not populated them?

Brendan Bannon · May 16, 2019

Keith, we seem to have a problem with creating the link. If you email me I can send you the samples: Brendan@intersys.com

Brendan Bannon · May 15, 2019

Ya I am trying to figure that out. I now have a link that looks like it should work but it does not.

Brendan Bannon · May 15, 2019

Keith

Here is a simple example I came up with that can show you the 2 different ways to do this. As the comments in the class state the restriction for Mapping.OneClassParentChild is every parent must have at least 1 child as the IDKey for the rows is the Child row. If you need to support parents without children this will not work. I will need to think about how we might be able to do that if that is needed. I would also push harder for the 2 classes and a view approach if that is needed.

The Main trick here is to make understand that each subscript level is made up of 3 different parts that you can change or reference:

1) Data Access Expression and Data Access Variables: This is the global and subscripts to be used at this level by the Next code, for example, you could have 3 levels that loop over 1 global and then at level 4 switch to a different global and only use 2 of the subscript values.... Data Access Variables are variables you can set at a given level before the Next code is executed. You can use them at this level or any higher level and you can set them to values from lower levels.

2) Next Code: This is the code to get from one value row to the next in a table. This could be the default $ORDER() code that is generated for a simple Subscript, or it could be override code you provide.

3) Invalid Conditions: If you provide Next Code at a given level then you should also provide Invalid Conditions. This is a test you can use to see if you have a valid row when the subscript value is provided as a constant instead of needing to loop to find the next one.

For your class I reference the Data Access Expression at Subscript Level 3 {3D} in the data mapping. This would be the global with the subscripts defined from the 2 lower Subscript levels: ^Mapping({L1},{L2})

OK so that was a lot of details you did not ask for, but just in case someone wanted to understand all the gory details I figured I would write them down.

This xml file has the different classes I created: Examples

Brendan Bannon · May 15, 2019

Hi Keith

You can map anything!

the question is do you really want to and is it the best thing to do?

This looks like a classic Parent Child relationship that would be defined by 2 classes. You could then create a view that would JOIN the 2 tables together and expose that view via ODBC and you would have what you want.

I can create a simple example that will show this data in one class, but I don't think it will add much value.

Brendan Bannon · Apr 23, 2019

Giray

Your code does not match your class descriptions. In your code you are using a property, Parent to do the child lookup, but there is no parent property defined in Example.Child.

If you truly want a Many to Many then you can't define a Parent property in the child, unless it is a list, and I am not recommending that!

I am not a fan of Lists of Objects, but this might not be too bad. I think I would still rather go with an array of objects as that would give you a third table and line up with the classic Many to Many format using 3 tables. If you want to stay with a List then I would suggest that you pass the List of Child IDs into the QueryChildren method. You can then use the list of Object IDs to get each child via SQL or Objects.

Brendan Bannon · Feb 13, 2019

Sorry for not answering this sooner. In Cache / InterSystems IRIS when you define a Parent Child relationship between 2 classes the Child class has an IDKey made up of 2 values: a reference back to the Parent and a unique identifier. Because the IDKey is a compound key you can not define a Bitmap index in the Child Ccass. That is drawback #1. Bitmap indices are not the be all / end all, but they can give you GREAT performance so it is nice to keep this option on the table.

The Parent Child relationship Storage is the second drawback, while it is not required, and it might not always be a drawback, the classic storage for a Parent Child relationship is to store all the data in the Parent Global.

Now to explain this I want to make sure everyone knows what I am talking about when I say Global. InterSystems stores persistent data in a sparse tree structure that we call Globals. Subscripts are always stored in ASCII Collated order, so if I did it right the way I show the globals here is the same way they would be stored. A simple persistent class will store the data in a global like this: ^Package.ClassD(id), where id in a unique integer. So 4 rows in this table could look like:

^Package.ClassD(1)=$LISTBUILD("data 1")

^Package.ClassD(2)=$LISTBUILD("data 2)

^Package.ClassD(3)=$LISTBUILD("data 3")

^Package.ClassD(4)=$LISTBUILD("data 4")

If we define a Child table in this class then we will store that child data in a lower level subscript

^Package.ClassD(1)=$LISTBUILD("data 1")

^Package.ClassD(1,"Child",1)=$LISTBUILD("Child 1")

^Package.ClassD(1,"Child",2)=$LISTBUILD("Child 2")

^Package.ClassD(2)=$LISTBUILD("data 2)

^Package.ClassD(12"Child",4)=$LISTBUILD("Child 4")

^Package.ClassD(3)=$LISTBUILD("data 3")

^Package.ClassD(4)=$LISTBUILD("data 4")

^Package.ClassD(4,"Child",3)=$LISTBUILD("Child 3")

Now add a second Parent Child relations where the child class is the parent so we have 3 subscript levels

^Package.ClassD(1)=$LISTBUILD("data 1")

^Package.ClassD(1,"Child",1)=$LISTBUILD("Child 1")

^Package.ClassD(1,"Child",1, "Grand",3)=$LISTBUILD("Grand 3")

^Package.ClassD(1,"Child",2)=$LISTBUILD("Child 2")

^Package.ClassD(1,"Child",2, "Grand",1)=$LISTBUILD("Grand 1")

^Package.ClassD(1,"Child",2, "Grand",2)=$LISTBUILD("Grand 2")

^Package.ClassD(2)=$LISTBUILD("data 2)

^Package.ClassD(12"Child",4)=$LISTBUILD("Child 4")

^Package.ClassD(3)=$LISTBUILD("data 3")

^Package.ClassD(4)=$LISTBUILD("data 4")

^Package.ClassD(4,"Child",3)=$LISTBUILD("Child 3")

It should be easy to imagine this global becoming very large storing all the different children at all the different levels. This increases the cost of getting the data for 1 class into memory because we need to load all the above data into memory just to get the 4 rows from the top level parent table. This increased IO is the second issue with Parent Child Relationships.

When you define a Foreign Key between 2 classes the storage structure is not changed. Both classes by default will have a positive Integer as the IDKey so they will both support Bitmaps. They will each have their own D global so you will only need to load the data for the "Child" table when it is needed.

If you go with the Foreign Key instead of the Parent Child make sure you define the FKey to use Cascade on Delete and you need to define an index in the "Child" class on the reference to the "Parent" class.

OK so that was longer than I thought it would be. Hope it makes sense. If you have more questions i will try to answer them faster this time.

Brendan Bannon · Nov 22, 2018

Hi Jimmy

If I understand correctly the only table that will support a bitmap is the Order Table on Server A. All the other tables have compound keys and will not support bitmaps.

It is possible to keep the Parent Child relationship and split the children out into their own globals. You just change the Global node in the Mapping to something else. That will make your Journal filter code a little more complex but it could improve performance a good bit.

The fact that you have OrderDetails and OrderDetails2 makes me think there is a lot of details for every order. If are storing a lot of data for each row and each order has a large number of children your global is going to be large and that will slow down query performance. Just switching the children to different globals could help.

Building Indices talks about about the fastest ways for you to rebuild an index on different types of systems. You might find this helpful for your shadow filter code.

I still recommend using default IDKeys and Foreign Keys but without know what the really important queries are and the Properties and TuneTable data it is hard to know how hard I should push you in that directions.

Bitmaps are great for properties that have fewer than 10,000 distinct values. The fewer the values the better the performance. For example Gender is a great property to use with a bitmap while PatientID is a very bad property to bitmap. The other big win with bitmaps is using multiple bitmaps together to solve a query. While we can do this multi index solution with standard indices as well the performance with bitmap is MUCH faster. For example if you want to find all the men who live in a given state and have a given eye color your where clause would be: WHERE Gender = ? and State = ? and EyeColor = ? If you have a bitmap on each of these properties the query plan would contain something like: Combine Bitmap GenderIndex INTERSECT StateIndex INTERSECT EyeColorIndex

Now you are thinking to yourself that you could define a standard index on (Gender, State, EyeColor) and get great performance and you would be correct, but what happens when the query changes so it does not include Gender? Now your compound index will not perform as well but the bitmaps will still be used and give great performance StateIndex INTERSECT EyeColorIndex. Compound indices are great to solve a specific query, but defining multiple bitmap instead gives you much more flexibility to solve more queries.

Hope this helps. If you want to get into more specific questions about a query I can help with that.

Brendan

Brendan Bannon · Nov 22, 2018

Hi Sean

2 reasons:

1: In general you get better performance if the data is stored in 2 different global. The only time having the data in 1 global is better is when you query both the parent and the child together. If you write a query against just one you still end up loading the other table as well. So over all 2 globals means less disk IO.

2: You can't use bitmaps in the child table because the ID of the child is a String and Bitmaps only work when the IDKey is a positive Integer.

The downside of using a Foreign Key is with Objects you don't have the methods to get at children like you do with a Relationship.

Brendan

Brendan Bannon · Nov 21, 2018

Hi Jimmy

Just to make sure we are talking about the same thing: in Cache the Primary Key is not always the ID Key. I think you are really talking about the ID Key, used in the subscripts of the D global. If I am correct then you have an index define on Year , Month, OrderNo with IDKey True.

From a Parent Child point of view having a compound IDKey is fine. The IDKey of the parent will be something like "2018||11||1" and then the IDKey of the children will just add one more subscript: "2018||11||1||child 1"

Generally speaking we recommend that you leave the IDKey as a system generated integer and instead of using a Parent Child relationship you define a Foreign Key with Cascade on Delete to get the same behavior. You can still define a Primary Key index on Year, Month, and OrderNo so deleted will be fast. By keeping the data in different globals and using Integers as IDKeys you will be able to define a Bitmap index in all of your classes and will keep disc IO to a minimum when queries the different tables.

Now maybe you say you will not be using Bitmaps and you will never query the children without also referencing the Parent data, so having the data in the same global makes sense. You need to look at everything case by case.

Will it work, Yes.

Will it perform the way you want it to, Maybe.

Sorry I can't give you a simple yes or no, but hopefully I have provided the needed info so you can make an educated decision. As always InterSystems Support is happy to go over your specific details and help you come up with what will work best.

Brendan Bannon

InterSystems Support Manager

Brendan Bannon · Oct 11, 2018

I don't think this is the right direction. I believe the problem is DUZ(2) is not always defined so queries get an <UNDEFINED> error. The user should not need to know anything about this variable, and if it is used to limit what data a person can see they REALLY should not have access to this variable.

In all other cases of mapping stuff like this the application would take care of setting the variable up and the user had no knowledge of the variable at all. Using SetServerInitCode() will make sure the variable is defined before the query is run. No changes to the class or your queries are needed.

Brendan

Brendan Bannon · Oct 9, 2018

Charles

DUZ(2) is a fileman variable. Before you can make use of this table you need to run some fileman code that will populate this variable. You can setup the system to execute the code for people trying to run queries over xDBC or from the Portal by using the following command

$SYSTEM.SQL.SetServerInitCode(code)

Where code would be a COS command to setup any needed variable.

brendan

Brendan Bannon · Jul 31, 2018

Hi Jo

the second example talks about this very case, it is limited to 2 namespaces, but you can generalize the code to look at as many Namespaces as you want.

The class is: Mapping.TwoNamespacesOneGlobal.xml

There is a link at the end of the article to let you download a zip file with all my examples. If that does not work please let me know and I can send it to you directly.

brendan

Brendan Bannon · Jun 11, 2018

Felipe

Something is messed up with my Studio so I can't open the wizard and get you a picture either. I can give you the storage in XML format.

In this example we have 5 properties. The IDKey, used as the subscript, 2 properties stored in the person global and 2 properties stored in the emp global.

The storage is setup to loop over the ^person global. We then use the Retrieval Code for the Title and HireDate Properties in the Data section of the storage to get the data out of the ^emp global

Here is my data:

^person(1)="Kaitlin^56009"
^person(2)="Melissa^56528"
^person(3)="Robin^57079"
^person(4)="Kieran^58210"

^emp(1)="Queen for the Day^64815"
^emp(2)="First Bride's Maid^64815"
^emp(3)="Second Bride's Maid^64815"
^emp(4)="Asher Boy^64815"

and here is my storage:

<SQLMap name="Map1">
<Data name="DOB">
<Delimiter>"^"</Delimiter>
<Piece>2</Piece>
</Data>
<Data name="HireDate">
<RetrievalCode> s {*}=$P(^emp({L1}),"^",2)</RetrievalCode>
</Data>
<Data name="Name">
<Delimiter>"^"</Delimiter>
<Piece>1</Piece>
</Data>
<Data name="Title">
<RetrievalCode> s {*}=$P(^emp({L1}),"^",1)</RetrievalCode>
</Data>
<Global>^person</Global>
<Structure>delimited</Structure>
<Subscript name="1">
<Expression>{EmpID}</Expression>
</Subscript>
<Type>data</Type>
</SQLMap>
<StreamLocation>^Mapping.SimpleS</StreamLocation>
<Type>%CacheSQLStorage</Type>

If you need help with an specific example send an email to Support@intersystems.com and I can work on that with you.

brendan

Brendan Bannon · Apr 12, 2018

I put in a request for Documentation to correct this.

Brendan Bannon · Feb 21, 2018

Yes I would like a better understanding of the difference as well.

Sadly I just try both and see if either change the query.

brendan

Brendan Bannon · Feb 21, 2018

Hello I am trying to get a good example, but for now I will just try to give you a general description. As the Docs say this hint is used in a subquery. At compile time the optimizer has 2 options for dealing with a subquery: 1) Treat it as a black box, executing the subquery independently from the other query and then applying the results to the outer query 2) rewrite the query incorporating the subquery into the outer query as a JOIN and combining conditions from the subquery to the WHERE clause of the outer query. If you look at the show plan of a query you can see which option the optimizer is taking. If you see a section of the plan with a heading of SubQuery then you know it is option 1. If the query does not have a Subquery section we are in option 2. You can see that with the following query: select home_state, avg(age) from sample.employee e where home_state = ANY (select MAX(v.address_state) from sample.vendor v where balance > 0) group by home_state will be rewritten while: select home_state, avg(age) from sample.employee e where home_state = ANY (select MAX(v.address_state) from %NOMERGE sample.vendor v where balance > 0) group by home_state will keep the subquery.