reinhard lebensorger · Sep 17, 2019

Cache and SQL Performance

We often have troubles with performance when joining  tables  with about 10 million rows together.

Sometimes we can't say what the cache engine is doing, it's just sucks and never finishes.

Then we have to rebuild the query and loop first on Orders and then on OrderPositions.

this is what we want:

EDITED: the join was only forgotten

select p.hpos,p.upos,p.spos, p.Fa,p.Auftnr,a.AA,
auf.aufpos p ,Auf.Auftrag a 
where a.Fa=1 and a.Fa=p.Fa
and a.AuftDat > ($HOROLOG-1000)
and a.Auftnr=p.Auftnr

this query does not finish...

then we rebuild and work with

select o.col1, o.col2,
from orders o 
where o.orderDate > ($H-1000)

and then loop :o(


indizes are correct, all join cols are indexed

i can't believe that SQL isn't available for us when we join tables with "more" data because exactly for this purpose databases have been developed

maybe there are hints like on other databases like "UPDATE STATISTICS"

also: write your experience, do your queries also "fail" by never return?

my suspicion goes in direction queryoptimizer, maybe the system has wrong statistics or something else. 

best regards, reinhard

Do you have problems with SQL when joining "bigger" tables

Register or login to poll

No - Never
Yes - Often
0 285
Discussion (5)1
Log in or sign up to continue

SQL Performance is a very big topic. So many ways how to achieve what you need. Just indexes not enough, you need right indexes and right queries. I think it will be much more productive if you provide some examples of your classes with populate, and queries. So, anybody will be able to reproduce your issue and offer some ways how to solve it.

To add on what Dmitry said.

We need to see the query, its plan and corresponding classes with selectivity information.

For example, do you really want to run query without join condition?

select o.col1, o.col2, op.partnum, op.amount
from orders o join orderpositions op
where o.orderDate > $H-1000

In the time it took me to write this @Alexander Koblov  also mentioned that this is a "JOIN with no condition" - he's absolutely right.

It's a challenge. Give a reproducible example for "play around". There are a bunch of hints: Query Optimization.

@Kyle.Baxter will also be interested in your issue.

@Vitaliy.Serdtsev  - you were right to summon me here!

SQL optimization is a HUGE topic (as has been mentioned) but InterSystems products are VERY good at JOIN-ing large tables.  Here's the issue, you're NOT doing a JOIN, you're doing a Cartesian Product.  Your query:

select o.col1, o.col2, op.partnum, op.amount
from orders o join orderpositions op
where o.orderDate > $H-1000

Is saying to JOIN EVERY order with EVERY order position over the last 3 years (give or take).  As you are not restricting how you're JOIN-ing the two tables together, there's not much we can do to optimize this.  That said, when bringing up these issues (and you should! either we help you write better code or we fix problems in our code!) you should provide your table definition and a query plan.  I think if you look at the query plan (which you can get from the management portal) you'll see that the problem is you're missing your JOIN condition (something like ON op.Order = o.ID, if I were to take a guess).  If that's not the case and this is the query you want to be running, please send over your class definitions by doing:

d $SYSTEM.OBJ.Export("User.orders.cls,User.orderpositions.cls","C:\Temp\KyleWantsThis.xml")

and I'll be happy to look at this further.  Of course, you are also very welcome to open up an issue with InterSystems Support - best support in the industry, and I'll bet a kidney on that statement.