Question
Robert Cemper · Aug 28, 2020

Effective use of Collection Indexing and Querying Collections through SQL

Triggered by a question placed by @Kurro Lopez  recently 
I took a closer look at the indexing of collections.
My simple test setup is a serial class and a persistent class with a list of this serial.

Class rcc.IC.serItem Extends (%SerialObject, %Populate)
{ Property Subject As %String [ Required ]; 
  Property Change As %TimeStamp [ Required ]; 
  Property Color As %String(COLLATION = "EXACT", 
     VALUELIST = ",red,white,blue,yellow,black,unknown") [ Required ];
}
Class rcc.IC.ItemList Extends (%Persistent, %Populate) [ Final ]
{ Property Company As %String [ Required ]; 
  Property Region As list Of %String(COLLATION = "EXACT", POPSPEC = ":4",
     VALUELIST = ",US,CD,MX,EU,JP,AU,ZA") [ Required ];
  Property Items As list Of DC.serItem(POPSPEC = ":4") [ Required ];
 
  Index xitm On Items(ELEMENTS);
  Index ycol On Items(ELEMENTS).Color;
}

Related Docs
Index xitm holds the complete serial element. !!
With some records generated by %Populate utility  I could place this query

Select ID,Company from rcc_IC.ItemList
Where FOR SOME %ELEMENT(rcc_IC.ItemList.Items) ($list(%Value,3) in ('blue','yellow'))

This works OK but disassembling every serial object wasn't very promising for my performance considerations.
So I followed a hit from @Dan Pasco  recently seen in this forum a few days ago,
and expecting better performance I added 

Index ycol On Items(ELEMENTS).Color;

The result was rather disappointing.
No improvement.
Investigation of the query plan showed that the new index was just ignored.


 After some trials, this query satisfied my needs

Select ID,Company 
from %IGNOREINDEX xitm rcc_IC.ItemList
Where FOR SOME %ELEMENT(rcc_IC.ItemList.Items) ('blue,yellow' [ %Value )

with

During the investigation with many variations I found this rule:

IF you have more than one ELEMENT index on the same property the 
query generator always takes the alphabetic first index it finds.
And you have to explicitly exclude a non-fitting index.

As  there is no hint in the documentation I would like to know:

Is this observation correct or is it just an accidental effect in my case?

As ELEMENT index was designed for List of %String  I understand that  having
more than one index was just an unlikely case at the time of design.

Code is available in Open Exchange here

80
0 1 5 187
Log in or sign up to continue

Replies

A test with 102,794 rows confirmed the performance difference.
#1)  on index xitm
Rows selected: 61320 Global Ref: 488,861 Commands: 5.244,585 Cached Query: %sqlcq.CACHE.cls16

#2) on index ycol
Rows selected: 61320 Global-Ref: 133,812 Commands: 3.909,205 Cached Query: %sqlcq.CACHE.cls13 

Reducing the query output to SELECT COUNT(ID) makes the difference even more dramatic
#1) Performance: 0.457 sec  Global-Ref: 574,193 Commands: 2.138,695
#2) Performance: 0.082 sec  Global-Ref: 205,973 Commands:   724,288

Hi Robert.

I also got carried away with this question and found two more ways to use indexes for a list of serial objects, and you can explicitly specify specific fields in the query, rather than $list (%Value,3).

The speed may be not always the best, but I did the best I could. I tested on Caché (perhaps something has been improved in IRIS in this regard?)

If you're interested, I can share it.

Yes, pls. share!
I always liked to see not just the ONE and ONLY solution but a choice.  
 

I keep my promise (yes, it wasn't an April Fool's joke ;)

  • The first way is associated with a dummy field for the sake of being able to override the final BuildValueArray method and avoid the following error
    ERROR #5272: Can't change final 'Method' : 'BuildValueArray')
    Class rcc.IC.ItemList Extends (%Persistent%Populate) [ Final ]
    {
    
    Index xitmp On (ItemsP(KEYS), ItemsP(ELEMENTS));
    
    Property Company As %String Required ];
    
    Property Region As list Of %String(COLLATION "EXACT"POPSPEC ":4"VALUELIST ",US,CD,MX,EU,JP,AU,ZA") [ Required ];
    
    Property Items As list Of rcc.IC.serItem(POPSPEC ":4") [ Required ];
    
    Property ItemsP As %String(COLLATION "EXACT") [ CalculatedPrivateReadOnlyRequiredSqlComputeCode = {{*} {Items}}, SqlComputed ];
    
    ClassMethod ItemsPBuildValueArray(
      value,
      ByRef arrayAs %Status
    {
      ptr=0
      while $listnext(value,ptr,val){
        v=$li(val,1),
          array($li(v,1))="Subject",
          array($li(v,2))="Change",
          array($li(v,3))="Color"
      }
      q $$$OK
    }
    
    }
    select count(IDfrom rcc_IC.ItemList where FOR SOME %ELEMENT(ItemsP) (%key in ('blue','yellow') and %value='Color')

    or if need to find values in any fields

    select count(IDfrom rcc_IC.ItemList where FOR SOME %ELEMENT(ItemsP) (%key in ('blue','yellow'))

    For the sake of speed, you can store in the index not all the fields of the serial class, i.e.:

    Index xitmp On ItemsP(KEYS);
    
    ClassMethod ItemsPBuildValueArray(
      value,
      ByRef arrayAs %Status
    {
      ptr=0
      while $listnext(value,ptr,val){
        v=$li(val,1),
          array($li(v,3))="" ; only Color
      }
      q $$$OK
    }
    select count(IDfrom rcc_IC.ItemList where FOR SOME %ELEMENT(ItemsP) (%key in ('blue','yellow'))

    You can also add more dummy fields and accordingly indexes to cover all possible queries.


  • The second way involves changing the storage and creating a virtual table
    Class rcc.IC.ItemList Extends (%Persistent%Populate) [ Final ]
    {
    
    Index xitm On Items(ELEMENTS).Color;
    
    Index xitm1 On (Items(ELEMENTS).Color, Items(KEYS));
    
    Property Company As %String Required ];
    
    Property Region As list Of %String(COLLATION "EXACT"POPSPEC ":4"VALUELIST ",US,CD,MX,EU,JP,AU,ZA") [ Required ];
    
    Property Items As list Of rcc.IC.serItem(POPSPEC ":4"STORAGEDEFAULT "array") [ Required ];
    
    }
    select count(distinct IDfrom rcc_IC.ItemList where ItemList_Items->Items_Color in ('blue','yellow') -- index "xitm1" is used

    or even faster

    select count(IDfrom rcc_IC.ItemList where FOR SOME %ELEMENT(Items) (%Value in ('blue','yellow')) -- index "xitm" is used

    Of course, the data can be accessed from both tables, just do not forget about SetCollectionProjection GetCollectionProjection (for more information, see my article SQL index for array property elements)

I miss a button  "Excellent Answer"    in the community

Big Thanks!