Written by

Question Arto Alatalo · Jan 30, 2020

Query Plan uses temp-file, why?

A quick question regarding to SQL Query Plan:
Why these two requests have different plans, in particularly, why second request needs temp file? To me, temp file is a bad thing which should be avoided, right?

  1. select * from Test.Log where cdate = 1
  2. select * from Test.Log where cdate > 1

Plan for 1:

Read index map Test.Log.cdateIndex, using the given %SQLUPPER(cdate), and looping on ID.
For each row:
 Read master map Test.Log.IDKEY, using the given idkey value.
 Output the row.

Plan for 2:

Call module B, which populates bitmap temp-file A.
Read bitmap temp-file A, looping on ID.

For each row:

 Read master map Test.Log.IDKEY, using the given idkey value.
 Output the row.
module B
Read index map Test.Log.cdateIndex, looping on %SQLUPPER(cdate) (with a range condition) and ID.

For each row:
 Add ID bit to bitmap temp-file A.

 

The class:

Class Test.Log Extends %Persistent
{

Property message As %String;

Property cdate As %String;

Index cdateIndex On cdate;

Storage Default
{
...
<Type>%Library.CacheStorage</Type>
}

}

Comments

Vitaliy Serdtsev · Jan 30, 2020

Are you sure that the "cdate" type is a string ? Maybe %Date or %PosixTime?

Even if leave a string, the result is highly dependent on ExtentSize. For example, if ExtentSize=10, the plans match.

0
Arto Alatalo  Jan 30, 2020 to Vitaliy Serdtsev

Edit: answered by Robert Cemper above.

Let's say the type is string and ExtentSize is 1000. Why second query needs temp file? I don't see why condition 'equal to' should be executed in different way than 'greater than'. 

0
Arto Alatalo  Jan 30, 2020 to Robert Cemper

Good point! Thank you Robert.

0
Robert Cemper · Jan 30, 2020

Assuming that cdate has  a rather narrow selectivity

#1) looking for a distinct value (1)  gives a rather moderate result set. So the index global might not consume too much buffer blocks.

#2) looking for >1  (combined with a  significant EXTENTSIZE may create a huge resultset.
So it is wise to scan the index global first {typically less blocks than data global] and keep the hits sorted by ID.
Then you run with this hit list along  the master map through your data global. Assuming that data require 
significantly more block reads than the index global.

With a reasonable amount of Global buffers, your temp Global even might not see the Disk storage at all. 

You see it's a bunch of assumptions hidden in this query plan.
The main goal, in any case, is to have as less "disk" access as possible.
I quoted "disk" as storage has so many kinds of variations that it just stands as a synonym.
 

0