Cache Dynamic SQL Pagination

Would like to know if there is an alternative or better way to paginate through a dataset using dynamic SQL than what I am using below. The problem is that as the potential pool of data gets larger, this code slows down to the point of not being useable. In analyzing each line of code below, it appears the slow down is related to the initial rset.%Next() iteration. Is there anything available which does not require a subquery/%VID such as a simple LIMIT/OFFSET?

My code is similar to :

s sql=##class(%SQL.Statement).%New()

s query="SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=x) WHERE %VID BETWEEN 1 AND 100"             

s sc=sql.%Prepare(query)

s rset=sql.%Execute()

while rset.%Next() {.....

  • + 1
  • 0
  • 672
  • 9
  • 1

Answers

Thanks for the suggestions, but I'm afraid none of these will fix my problem.

- ScrollableResultSet has the most promise, but still does not satisfy what I need. The data is constantly being updated so using this would not be viable.

-The second suggestion is what I am doing above. Using %VID with a subquery is just too slow when dealing with large datasets. For ex. The following query has a potential pool of 1mil results. SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=x) WHERE %VID BETWEEN 1 AND 100. Eventhough I am only asking for the first 100, there is still a major performance hit when executing the first rset.%Next() due to, what I assume is, the code trying to find the 100 records I am requesting out of the 1 mil records.

- The third article mentions a few options of which some are not available in Cache, such as LIMIT and OFFSET. Also using Cursors or Keysets would not be viable as my application allows jump to, sorting and filtering functionality.

- And the 4th talks about using LIMIT and OFFSET which are, again, not available in Cache.

Any other thoughts?

So, in order.

- The third article mentions a few options of which some are not available in Cache, such as LIMIT and OFFSET.

Caché has analogues - TOP N and %vid, which with more than replace the LIMIT/OFFSET. In the second link this is discussed in detail.

- And the 4th talks about using LIMIT and OFFSET which are, again, not available in Cache.

The essence of the article is to replace query

SELECT user_id, external_id, name, metadata, date_created
FROM users
ORDER BY user_id ASC
LIMIT 50 000 000, 10 000; --- 5 000th page * 10 000 page size

10 000 rows in set (40.81 sec)

to

SELECT user_id, external_id, name, metadata, date_created
FROM users
WHERE user_id > 51 234 123 --- value of user_id for 50 000 000th record
ORDER BY user_id ASC
LIMIT 10 000

10 000 rows in set (0.03 sec)
Eventhough I am only asking for the first 100, there is still a major performance hit when executing the first rset.%Next() due to, what I assume is, the code trying to find the 100 records I am requesting out of the 1 mil records.

In this case, the following query is sufficient:

SELECT TOP 100 prop FROM table WHERE prop=?
-The second suggestion is what I am doing above. Using %VID with a subquery is just too slow when dealing with large datasets.

Is there an index for "prop"?

You tuned the table?

I have all works quickly:

Class dc.test Extends %Persistent
{

Index iprop On prop;

Property prop As %String;

ClassMethod Fill(
  Nrow = {1e6},
  Npage = {1e3})
{
  DISABLE^%NOJRN
    ,..%KillExtent()
  
  time=$zh
    ,^dc.testD=Nrow
  
  id=1:1:Nrow {
    p=id-1\Npage+1
      ,v=id-1#Npage+1
      ,val=p_":"_v
      ,^dc.testD(id)=$lb("",val)
      ,^dc.testI("iprop",$$$SQLUPPER(val),id)=""
  }
  
  "(Fill) time = ",$zh-time," s.",!!
  
  zw:Nrow<=50 ^dc.testD,^dc.testI
  
  ENABLE^%NOJRN
    ,$system.SQL.TuneTable($classname(),$$$YES)
    ,$system.OBJ.Compile($classname(),"cu-d")
}

ClassMethod Query(
  As %TinyInt 1,
  prop As %String,
  rownum1 As %Integer,
  rownum2 As %Integer)
{
  sql(1)="select *,%vid from (select %ID,prop from dc.test where prop %startswith ?) where %vid between ? and ?"
    ,sql(2)="select *,%vid from (select %ID,prop from %ignoreindex iprop dc.test where prop like ?) where %vid between ? and ?"
    ,sql(3)="select *,%vid from (select top ? %ID,prop from %ignoreindex iprop dc.test where prop like ? order by %ID desc) order by %vid desc"

    ,time=$zh
    ,rs=$s(q=3:##class(%SQL.Statement).%ExecDirect(,sql(q),rownum1,prop),
             1:##class(%SQL.Statement).%ExecDirect(,sql(q),prop,rownum1,rownum2))
    
  'rs.%SQLCODE {
    while rs.%Next() {
      /*
      s id=rs.%Get("ID")
        ,prop=rs.%Get("prop")
      w id," ",prop,!
      */
      rs.%Print()
    }
  }
  "(",$lts($lb(q,prop,rownum1,rownum2)),") time = ",$zh-time," s.",!!
}

/// d ##class(dc.test).Test()
ClassMethod Test()
{
  ..Fill()
    ,..Query(1,"1000:",111,111+16)
    ,..Query(2,"%12%",111,111+16)
    ,..Query(1,"1",111984,111984+16)

    ,..Query(2,"%12%",39584,39584+16) ;# slow (last 17)
    ,..Query(3,"%12%",17,"")          ;# fast (last 17)
}
}

Result:

USER>##class(dc.test).Test()
(Fill) time = 1.277645 s.
 
999111 1000:111 111
999112 1000:112 112
999113 1000:113 113
999114 1000:114 114
999115 1000:115 115
999116 1000:116 116
999117 1000:117 117
999118 1000:118 118
999119 1000:119 119
999120 1000:120 120
999121 1000:121 121
999122 1000:122 122
999123 1000:123 123
999124 1000:124 124
999125 1000:125 125
999126 1000:126 126
999127 1000:127 127
(1,1000:,111,127) time = .084489 s.
 
5128 6:128 111
5129 6:129 112
5212 6:212 113
5312 6:312 114
5412 6:412 115
5512 6:512 116
5612 6:612 117
5712 6:712 118
5812 6:812 119
5912 6:912 120
6012 7:12 121
6112 7:112 122
6120 7:120 123
6121 7:121 124
6122 7:122 125
6123 7:123 126
6124 7:124 127
(2,%12%,111,127) time = .091251 s.
 
999984 1000:984 111984
999985 1000:985 111985
999986 1000:986 111986
999987 1000:987 111987
999988 1000:988 111988
999989 1000:989 111989
999990 1000:990 111990
999991 1000:991 111991
999992 1000:992 111992
999993 1000:993 111993
999994 1000:994 111994
999995 1000:995 111995
999996 1000:996 111996
999997 1000:997 111997
999998 1000:998 111998
999999 1000:999 111999
1000000 1000:1000 112000
(1,1,111984,112000) time = .66504 s.
 
999121 1000:121 39584
999122 1000:122 39585
999123 1000:123 39586
999124 1000:124 39587
999125 1000:125 39588
999126 1000:126 39589
999127 1000:127 39590
999128 1000:128 39591
999129 1000:129 39592
999212 1000:212 39593
999312 1000:312 39594
999412 1000:412 39595
999512 1000:512 39596
999612 1000:612 39597
999712 1000:712 39598
999812 1000:812 39599
999912 1000:912 39600
(2,%12%,39584,39600) time = 1.946264 s.
 
999121 1000:121 17
999122 1000:122 16
999123 1000:123 15
999124 1000:124 14
999125 1000:125 13
999126 1000:126 12
999127 1000:127 11
999128 1000:128 10
999129 1000:129 9
999212 1000:212 8
999312 1000:312 7
999412 1000:412 6
999512 1000:512 5
999612 1000:612 4
999712 1000:712 3
999812 1000:812 2
999912 1000:912 1
(3,%12%,17,) time = .089032 s.

I'm sure I'm missing something, but so far, I'm just not seeing how this would speed up my query.

My reason for thinking the 4th article wouldn't work for me is because I am allowing the user to jump to a page, and with filtering and sorting also available (as well as a constantly changing database), I don't see how I could determine the keyval/id to use in the query.

Also, this query will work when grabbing the first page, but not for subsequent pages SELECT TOP 100 prop FROM table WHERE prop=?

The TOP N and %vid are very slow when dealing with a large result set.

The example you give is great, but I don't think it has 1mil possible matches which are being filtered down to 100 results (not 100% sure about that statement..:)). This is the problem I am having with using TOP N and %vid.

For example, say I want to grab all data for the month of July that match some specific search parameters (which happens to result in 1,000,000 matched items), and since I am currently on page 5 (and each page displays 100 results), I want to offset the results by 400 items. The first %Next() takes a tremendous performance hit.

s sql=##class(%SQL.Statement).%New()

s query="SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=val) WHERE %VID BETWEEN 401 AND 500"                                                  

s sc=sql.%Prepare(query)

s rset=sql.%Execute()

s t=$zh d rset.%Next() w $zh-t
6.105871                     

Could you provide a complete example similar to mine (code + data + queries with sorting and filtering) that you have a issue with?

How much all rows in the table?
How much RAM?
Which version $zv?

Pardon the formatting but please see below. Conclusion - Every time I add 100,000 to the table, it takes longer to perform the initial %Next(), even though I'm still only asking for the first 5 results. This is a simple table with a simple query. The one I'm dealing with in my app is a much more complex table and query with an ORDER BY clause which slows it down even more, but the example below gives an idea of what I am talking about.

Class mp.test Extends (%Persistent)
{
  Property prop1;
  Property prop2;
  Property prop3;
  Index idx3 On (prop1, prop2, prop3);

ClassMethod Fill(total)
{
    ;d DISABLE^%NOJRN
    ;d ..%KillExtent()
    s prop1="name"
    s prop2="prop2"
    f i=1:1:total {
         s prop3=$r(30)
         s id=$g(^mp.testD)+1
         s ^mp.testD(id)=$lb(""," "_prop1," "_prop2," "_prop3)
         ^mp.testI("idx3",$$$SQLUPPER(prop1),$$$SQLUPPER(prop2),$$$SQLUPPER(prop3),id)=""
         ^mp.testD=$g(^mp.testD)+1
    }
}

ClassMethod Query()
{
    s sql=##class(%SQL.Statement).%New()
    query = "SELECT *,%vid FROM (SELECT %ID,prop3 FROM mp.test WHERE prop1='name' AND prop3>='1' AND prop3<='30' AND prop2='prop2') WHERE %VID BETWEEN 1 AND 5"
    s sc=sql.%Prepare(query)
    i $$$ISOK(sc) {
         s rset=sql.%Execute()
         i 'rset.%SQLCODE {
              s t1=$zh
             d rset.%Next()
            w $zh-t1,!
            d rset.%Display()
         }     
    }
 }
}
 

TEST 1 - Loading 100,000 items into table. Query takes .314 secs

user>D ##class(mp.test).Fill(100000)                                      
user>D ##class(mp.test).Query()    
.314167 secs                            
ID prop3 Literal_3                      
2  19 2                                 
3  19 3                                 
4  19 4                                 
5  15 5                                 
                                        
5 Rows(s) Affected                      

------------------------------------------------

TEST 2 - adding another 100,000 to table. Same query takes .64secs

user>D ##class(mp.test).Fill(100000)                                       
user>D ##class(mp.test).Query()    
.640992 secs                            
ID prop3 Literal_3                      
2  15 2                                 
3  23 3                                 
4  26 4                                 
5  19 5                                 
                                        
5 Rows(s) Affected  

--------------------------------------------------------

TEST 3 - another 100,000. query takes .865 secs

user>D ##class(mp.test).Fill(100000)                                         
user>D ##class(mp.test).Query()     
.865654 secs                             
ID prop3 Literal_3                       
2  15 2                                  
3  23 3                                  
4  26 4                                  
5  19 5                                  
                                         
5 Rows(s) Affected                                            

----------------------------------------

TEST 4 - another 100,000 takes 1.16secs

user>D ##class(mp.test).Fill(100000)                                       
user>D ##class(mp.test).Query()    
1.169932 secs                           
ID prop3 Literal_3                      
2  15 2                                 
3  23 3                                 
4  26 4                                 
5  19 5                                 

--------------------------------------------

TEST 5 - another 100,000. query takes 1.44 secs

user>D ##class(mp.test).Fill(100000)                                       
user>D ##class(mp.test).Query()    
1.44849 secs                            
ID prop3 Literal_3                      
2  15 2                                 
3  23 3                                 
4  26 4                                 
5  19 5                                 
                                        
5 Rows(s) Affected                      

And if so?

query "select %ID,prop3 from mp.test where %ID in (SELECT * FROM (SELECT %ID FROM mp.test WHERE prop1='name' AND prop3>='1' AND prop3<='30' AND prop2='prop2'WHERE %VID BETWEEN 1 AND 5)"

Whoa!!! Upon initial testing that is lightning fast!!! Went from +1s down to .001. I'm going to fool around with this some more but so far this is exactly what I was looking for. Thank you so much!!!!

SELECT %ID,prop3 FROM mp.test WHERE %ID IN (
  SELECT * FROM (
    SELECT TOP ALL %ID FROM mp.test WHERE prop1='name' AND prop3>='1' AND prop3<='30' AND prop2='prop2' ORDER BY prop3 DESC
  WHERE %VID BETWEEN 1 AND 5
) -- ORDER BY prop3 DESC