Question
· Jun 3

What's this odd behavior with "select count()" and a Unique Index?

Hello,

I have a class with a "Unique" index (pxfactidIndex) on a %Numeric property (pxfactid) (partially-edit code snippet below):

Property pxfactid As %Library.Numeric(MAXVAL = 9223372036854775807, MINVAL = -9223372036854775808, SCALE = 0) [ SqlColumnNumber = 7 ];
Index pxfactidIndex On pxfactid [ Unique ];
Storage Default
{
<Data name="FactDefaultData">
<Value name="1">
<Value>pysubjectid</Value>
</Value>
...
<Value name="6">
<Value>pxfactid</Value>
</Value>
...
</Data>
<DataLocation>^CRMBI.FactD</DataLocation>
<DefaultData>FactDefaultData</DefaultData>
<ExtentLocation>^CRMBI.Fact</ExtentLocation>
<ExtentSize>3521840</ExtentSize>
<IdFunction>sequence</IdFunction>
<IdLocation>^CRMBI.FactD</IdLocation>
<Index name="DDLBEIndex">
<Location>^CRMBI.FactI.2</Location>
</Index>
<Index name="IDKEY">
<Location>^CRMBI.FactD</Location>
</Index>
<Index name="pxfactidIndex">
<Location>^CRMBI.Fact.4</Location>
</Index>
...
<IndexLocation>^CRMBI.FactI</IndexLocation>
<Property name="%%ID">
<AverageFieldSize>4.99</AverageFieldSize>
<Histogram>$lb(.06666666666666666667,1,0,$lb(27673,228574,456707,618301,892202,1107091,1405679,1604163,1738513,2015749,2270724,2530345,2830827,3047237,3296375,3504324),$lb(1,0,0,0,0,1,1,1,0,1,1,1,0,1,1),$lb(842479159,926299955,842544439,842151989,875902519,875902519,909195315,909195315,943272498,943272498,825307191,825243440,875574582,875574582,909128753,909128753,926103605,825701176,842019125,808531255,842477623,842477623,892547123,892547123,942878776,842543920,858797111,808728370,842610227,842610227,892351539,859123764))</Histogram>
<Selectivity>1</Selectivity>
</Property>
<Property name="pxfactid">
<AverageFieldSize>9.99</AverageFieldSize>
<Histogram>$lb(.06666666666666666667,1,0,$lb(-8377896048695389137,-7487164305435710908,-6406061662044370629,-5112561079609519767,-4588722295094185245,-4512887624516713113,-3293381287254976544,-1753938466698510428,-1561375857877981192,-211989232600898713,2857966673366783778,4457490938253726327,5246678144060189771,6591622321393794159,8944949574919674333,8944949574921469469),$lb(1,1,1,1,3,1,1,2,1,0,0,0,0,0,11),$lb(758657847,942880567,926169143,926169143,909389878,909389878,892416306,892416306,875903032,943208242,825374776,875901234,858929459,858929459,825701683,926233401,892743987,825570865,842084665,758264113,842544439,842544439,875836727,875836727,892482614,892482614,909457713,909457713,943273012,825833015,842085430,943273012))</Histogram>
<Selectivity>0.0000283942484610317%</Selectivity>
</Property>

I see the following odd behavior when counting how many records have that property set to null:

  • The first query (17) returns the number of records in the table. The 2nd query (18) returns what I expect: all records have non-null values for that "pxfactid" property. However, when I run the 3rd query (19) which counts all records with non-null values in that field, I get a much smaller, unexpected, number.

  • The 4th query (20) counts the number of records in a subquery that returns all non-null values for that property in the table and returns the expected number that matches the total number of records in the table.

Does anyone have any idea of what type of coding error would produce the odd behavior seen in query 19 (wrong number of records with non-null values in that indexed property)?

Note that the values in that field have a large range; here's a sample of some of them:

Thanks

Product version: IRIS 2024.1
$ZV: IRIS for Windows (x86-64) 2024.1.1 (Build 347U) Thu Jul 18 2024 17:35:51 EDT
Discussion (7)3
Log in or sign up to continue

Hi Jean,

at first glance I'd expect the query plan for #18. and #19. should be quite similar
:SQL offers 3 levels to see the query plan

  • show                  
    • Show the execution plan for the current statement.
  • show pl[an] [v[erbose]]                          
    • Shows the current statement execution plan.                          
    • If the verbose qualifier is used, show all the module                          
    • details for the current statement's execution plan;                          
    • Otherwise, display only the top-level module details                        
    •  by default.
  • show planalt [v[erbose]]                          
    • Shows the current statement alternate execution plans.                          
    • If the verbose qualifier is used, show all the module                          
    • details for the current statement's execution plan and                          
    • all alternate plans; Otherwise, display only the                          
    • top-level module details by default..

This might offer a chance to identify the difference.

There's still an - unlikely - chance that #19  runs on some broken cached query,
that never was updated. so clear cached queries might be a possible solution. Not an explanation 

Best regards, Robert

Hi again Robert,

I also would have expected the plans for both 18 and 19 to be the same, but "quelle surprise!":

18: Beautiful, expected plan:

19: What? I must have messed something up in the evolution of this class to have such an expensive plan that produces incorrect results:

I'll investigate further to figure out how to address that second plan, but any input from you or others would be welcome.

Thanks and best regards.

I think the difference in relative costs makes sense. In the "WHERE pxfactid IS NULL" query, you are only looping over a specific subsection of the global. This is similar if you have a query that says SELECT COUNT(*) FROM SomeTable WHERE pxfactid  = 'hello'

In the second query, you are not looping over subrange of nodes that have a specific value, instead you are looping over the entirety of the index and testing whether its content do not match null. This is equivalent to SELECT COUNT(*) FROM SomeTable WHERE pxfactid  != 'hello'

In any unclear situation, when incorrect data is returned, two things need to be done first: rebuild the indexes and tune tables:

Since you have a large range of values, I would pay attention to INDEXNULLMARKER: Indexing a NULL