Question
· Jul 17, 2018

Cache Dynamic SQL Pagination

Would like to know if there is an alternative or better way to paginate through a dataset using dynamic SQL than what I am using below. The problem is that as the potential pool of data gets larger, this code slows down to the point of not being useable. In analyzing each line of code below, it appears the slow down is related to the initial rset.%Next() iteration. Is there anything available which does not require a subquery/%VID such as a simple LIMIT/OFFSET?

My code is similar to :

s sql=##class(%SQL.Statement).%New()

s query="SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=x) WHERE %VID BETWEEN 1 AND 100"             

s sc=sql.%Prepare(query)

s rset=sql.%Execute()

while rset.%Next() {.....

Discussion (30)3
Log in or sign up to continue

Thanks for the suggestions, but I'm afraid none of these will fix my problem.

- ScrollableResultSet has the most promise, but still does not satisfy what I need. The data is constantly being updated so using this would not be viable.

-The second suggestion is what I am doing above. Using %VID with a subquery is just too slow when dealing with large datasets. For ex. The following query has a potential pool of 1mil results. SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=x) WHERE %VID BETWEEN 1 AND 100. Eventhough I am only asking for the first 100, there is still a major performance hit when executing the first rset.%Next() due to, what I assume is, the code trying to find the 100 records I am requesting out of the 1 mil records.

- The third article mentions a few options of which some are not available in Cache, such as LIMIT and OFFSET. Also using Cursors or Keysets would not be viable as my application allows jump to, sorting and filtering functionality.

- And the 4th talks about using LIMIT and OFFSET which are, again, not available in Cache.

Any other thoughts?

So, in order.

- The third article mentions a few options of which some are not available in Cache, such as LIMIT and OFFSET.

Caché has analogues - TOP N and %vid, which with more than replace the LIMIT/OFFSET. In the second link this is discussed in detail.

- And the 4th talks about using LIMIT and OFFSET which are, again, not available in Cache.

The essence of the article is to replace query

SELECT user_id, external_id, name, metadata, date_created
FROM users
ORDER BY user_id ASC
LIMIT 50 000 000, 10 000; --- 5 000th page * 10 000 page size

10 000 rows in set (40.81 sec)

to

SELECT user_id, external_id, name, metadata, date_created
FROM users
WHERE user_id > 51 234 123 --- value of user_id for 50 000 000th record
ORDER BY user_id ASC
LIMIT 10 000

10 000 rows in set (0.03 sec)
Eventhough I am only asking for the first 100, there is still a major performance hit when executing the first rset.%Next() due to, what I assume is, the code trying to find the 100 records I am requesting out of the 1 mil records.

In this case, the following query is sufficient:

SELECT TOP 100 prop FROM table WHERE prop=?
-The second suggestion is what I am doing above. Using %VID with a subquery is just too slow when dealing with large datasets.

Is there an index for "prop"?

You tuned the table?

I have all works quickly:

Class dc.test Extends %Persistent
{

Index iprop On prop;

Property prop As %String;

ClassMethod Fill(
  Nrow = {1e6},
  Npage = {1e3})
{
  DISABLE^%NOJRN
    ,..%KillExtent()
  
  time=$zh
    ,^dc.testD=Nrow
  
  id=1:1:Nrow {
    p=id-1\Npage+1
      ,v=id-1#Npage+1
      ,val=p_":"_v
      ,^dc.testD(id)=$lb("",val)
      ,^dc.testI("iprop",$$$SQLUPPER(val),id)=""
  }
  
  "(Fill) time = ",$zh-time," s.",!!
  
  zw:Nrow<=50 ^dc.testD,^dc.testI
  
  ENABLE^%NOJRN
    ,$system.SQL.TuneTable($classname(),$$$YES)
    ,$system.OBJ.Compile($classname(),"cu-d")
}

ClassMethod Query(
  As %TinyInt 1,
  prop As %String,
  rownum1 As %Integer,
  rownum2 As %Integer)
{
  sql(1)="select *,%vid from (select %ID,prop from dc.test where prop %startswith ?) where %vid between ? and ?"
    ,sql(2)="select *,%vid from (select %ID,prop from %ignoreindex iprop dc.test where prop like ?) where %vid between ? and ?"
    ,sql(3)="select *,%vid from (select top ? %ID,prop from %ignoreindex iprop dc.test where prop like ? order by %ID desc) order by %vid desc"

    ,time=$zh
    ,rs=$s(q=3:##class(%SQL.Statement).%ExecDirect(,sql(q),rownum1,prop),
             1:##class(%SQL.Statement).%ExecDirect(,sql(q),prop,rownum1,rownum2))
    
  'rs.%SQLCODE {
    while rs.%Next() {
      /*
      s id=rs.%Get("ID")
        ,prop=rs.%Get("prop")
      w id," ",prop,!
      */
      rs.%Print()
    }
  }
  "(",$lts($lb(q,prop,rownum1,rownum2)),") time = ",$zh-time," s.",!!
}

/// d ##class(dc.test).Test()
ClassMethod Test()
{
  ..Fill()
    ,..Query(1,"1000:",111,111+16)
    ,..Query(2,"%12%",111,111+16)
    ,..Query(1,"1",111984,111984+16)

    ,..Query(2,"%12%",39584,39584+16) ;# slow (last 17)
    ,..Query(3,"%12%",17,"")          ;# fast (last 17)
}
}

Result:

USER>##class(dc.test).Test()
(Fill) time = 1.277645 s.
 
999111 1000:111 111
999112 1000:112 112
999113 1000:113 113
999114 1000:114 114
999115 1000:115 115
999116 1000:116 116
999117 1000:117 117
999118 1000:118 118
999119 1000:119 119
999120 1000:120 120
999121 1000:121 121
999122 1000:122 122
999123 1000:123 123
999124 1000:124 124
999125 1000:125 125
999126 1000:126 126
999127 1000:127 127
(1,1000:,111,127) time = .084489 s.
 
5128 6:128 111
5129 6:129 112
5212 6:212 113
5312 6:312 114
5412 6:412 115
5512 6:512 116
5612 6:612 117
5712 6:712 118
5812 6:812 119
5912 6:912 120
6012 7:12 121
6112 7:112 122
6120 7:120 123
6121 7:121 124
6122 7:122 125
6123 7:123 126
6124 7:124 127
(2,%12%,111,127) time = .091251 s.
 
999984 1000:984 111984
999985 1000:985 111985
999986 1000:986 111986
999987 1000:987 111987
999988 1000:988 111988
999989 1000:989 111989
999990 1000:990 111990
999991 1000:991 111991
999992 1000:992 111992
999993 1000:993 111993
999994 1000:994 111994
999995 1000:995 111995
999996 1000:996 111996
999997 1000:997 111997
999998 1000:998 111998
999999 1000:999 111999
1000000 1000:1000 112000
(1,1,111984,112000) time = .66504 s.
 
999121 1000:121 39584
999122 1000:122 39585
999123 1000:123 39586
999124 1000:124 39587
999125 1000:125 39588
999126 1000:126 39589
999127 1000:127 39590
999128 1000:128 39591
999129 1000:129 39592
999212 1000:212 39593
999312 1000:312 39594
999412 1000:412 39595
999512 1000:512 39596
999612 1000:612 39597
999712 1000:712 39598
999812 1000:812 39599
999912 1000:912 39600
(2,%12%,39584,39600) time = 1.946264 s.
 
999121 1000:121 17
999122 1000:122 16
999123 1000:123 15
999124 1000:124 14
999125 1000:125 13
999126 1000:126 12
999127 1000:127 11
999128 1000:128 10
999129 1000:129 9
999212 1000:212 8
999312 1000:312 7
999412 1000:412 6
999512 1000:512 5
999612 1000:612 4
999712 1000:712 3
999812 1000:812 2
999912 1000:912 1
(3,%12%,17,) time = .089032 s.

I'm sure I'm missing something, but so far, I'm just not seeing how this would speed up my query.

My reason for thinking the 4th article wouldn't work for me is because I am allowing the user to jump to a page, and with filtering and sorting also available (as well as a constantly changing database), I don't see how I could determine the keyval/id to use in the query.

Also, this query will work when grabbing the first page, but not for subsequent pages SELECT TOP 100 prop FROM table WHERE prop=?

The TOP N and %vid are very slow when dealing with a large result set.

The example you give is great, but I don't think it has 1mil possible matches which are being filtered down to 100 results (not 100% sure about that statement..:)). This is the problem I am having with using TOP N and %vid.

For example, say I want to grab all data for the month of July that match some specific search parameters (which happens to result in 1,000,000 matched items), and since I am currently on page 5 (and each page displays 100 results), I want to offset the results by 400 items. The first %Next() takes a tremendous performance hit.

s sql=##class(%SQL.Statement).%New()

s query="SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=val) WHERE %VID BETWEEN 401 AND 500"                                                  

s sc=sql.%Prepare(query)

s rset=sql.%Execute()

s t=$zh d rset.%Next() w $zh-t
6.105871                     

Pardon the formatting but please see below. Conclusion - Every time I add 100,000 to the table, it takes longer to perform the initial %Next(), even though I'm still only asking for the first 5 results. This is a simple table with a simple query. The one I'm dealing with in my app is a much more complex table and query with an ORDER BY clause which slows it down even more, but the example below gives an idea of what I am talking about.

Class mp.test Extends (%Persistent)
{
  Property prop1;
  Property prop2;
  Property prop3;
  Index idx3 On (prop1, prop2, prop3);

ClassMethod Fill(total)
{
    ;d DISABLE^%NOJRN
    ;d ..%KillExtent()
    s prop1="name"
    s prop2="prop2"
    f i=1:1:total {
         s prop3=$r(30)
         s id=$g(^mp.testD)+1
         s ^mp.testD(id)=$lb(""," "_prop1," "_prop2," "_prop3)
         ^mp.testI("idx3",$$$SQLUPPER(prop1),$$$SQLUPPER(prop2),$$$SQLUPPER(prop3),id)=""
         ^mp.testD=$g(^mp.testD)+1
    }
}

ClassMethod Query()
{
    s sql=##class(%SQL.Statement).%New()
    query = "SELECT *,%vid FROM (SELECT %ID,prop3 FROM mp.test WHERE prop1='name' AND prop3>='1' AND prop3<='30' AND prop2='prop2') WHERE %VID BETWEEN 1 AND 5"
    s sc=sql.%Prepare(query)
    i $$$ISOK(sc) {
         s rset=sql.%Execute()
         i 'rset.%SQLCODE {
              s t1=$zh
             d rset.%Next()
            w $zh-t1,!
            d rset.%Display()
         }     
    }
 }
}
 

TEST 1 - Loading 100,000 items into table. Query takes .314 secs

user>D ##class(mp.test).Fill(100000)                                      
user>D ##class(mp.test).Query()    
.314167 secs                            
ID prop3 Literal_3                      
2  19 2                                 
3  19 3                                 
4  19 4                                 
5  15 5                                 
                                        
5 Rows(s) Affected                      

------------------------------------------------

TEST 2 - adding another 100,000 to table. Same query takes .64secs

user>D ##class(mp.test).Fill(100000)                                       
user>D ##class(mp.test).Query()    
.640992 secs                            
ID prop3 Literal_3                      
2  15 2                                 
3  23 3                                 
4  26 4                                 
5  19 5                                 
                                        
5 Rows(s) Affected  

--------------------------------------------------------

TEST 3 - another 100,000. query takes .865 secs

user>D ##class(mp.test).Fill(100000)                                         
user>D ##class(mp.test).Query()     
.865654 secs                             
ID prop3 Literal_3                       
2  15 2                                  
3  23 3                                  
4  26 4                                  
5  19 5                                  
                                         
5 Rows(s) Affected                                            

----------------------------------------

TEST 4 - another 100,000 takes 1.16secs

user>D ##class(mp.test).Fill(100000)                                       
user>D ##class(mp.test).Query()    
1.169932 secs                           
ID prop3 Literal_3                      
2  15 2                                 
3  23 3                                 
4  26 4                                 
5  19 5                                 

--------------------------------------------

TEST 5 - another 100,000. query takes 1.44 secs

user>D ##class(mp.test).Fill(100000)                                       
user>D ##class(mp.test).Query()    
1.44849 secs                            
ID prop3 Literal_3                      
2  15 2                                 
3  23 3                                 
4  26 4                                 
5  19 5                                 
                                        
5 Rows(s) Affected                      

 

Source code

USER>##class(mp.test).Fill(5000000)

USER>##class(mp.test).Query()
count=3833346

first 5 [1:5]
ID prop3
3 3
4 3
24 3
30 3
97 3

5 Rows(s) Affected
.000328 secs

last 5 [3833342:3833346]
ID prop3
4999798 1
4999817 1
4999836 1
4999866 1
4999947 1

5 Rows(s) Affected
2.884304 secs

PS: for those who put a minus. May I ask why?

Let's change the data and then test it with an order by:
Prop1 is i %Integer with valuelist of 1 to 10. Populated randomly, with 60% of the values being 2.
Prop2 is can stay as is and not relevant in this test.
Prop3 is a %String of length 20. Populate with 4M random values.
Prop1 has a bitmap index.
Prop2 has a bitmap index.
Prop3 has an index.

To use order by, it needs to look like this, and this is a lot slower:
select %ID,prop3 
from mp.test 
where %ID in 
  (SELECT * FROM 
    (SELECT TOP ALL %ID
     FROM mp.test 
     WHERE prop1 = 2
     ORDER BY prop3
  
  WHERE %VID BETWEEN 3000000 AND 30000200
)

Class mp.test Extends (%Persistent, %Populate)
{

Property prop1 As %Integer(POPSPEC = ".PopulateProp1()", VALUELIST = ",1,2,3,4,5,6,7,8,9,10");

Property prop2 As %Boolean(POPSPEC = "Integer(0,1)");

Property prop3 As %String(MAXLEN = "", POPSPEC = "LastName()");

Index prop1Index On prop1 [ Type = bitmap ];

Index prop2Index On prop2 [ Type = bitmap ];

Index prop3Index On prop3;

ClassMethod Fill(total)
{
    ;d DISABLE^%NOJRN
    ;d ..%KillExtent()
    ;do ..Populate(5000000)
}

Method PopulateProp1() As %Integer
{
    set tInt = $random(19) + 1
    set:(tInt > 10) tInt = 2
    return tInt
}

ClassMethod Query()
{
    s sql=##class(%SQL.Statement).%New()
    s query = "select %ID,prop3 from mp.test where %ID in  (SELECT * FROM  (SELECT TOP ALL %ID FROM mp.test  WHERE prop1 = 2 ORDER BY prop3) WHERE %VID BETWEEN 3000000 AND 30000005)"
    w !,"Query Before Prepare:  ",$zh
    s sc=sql.%Prepare(query)
    w !,"Query After Prepare:  ",$zh
    i $$$ISOK(sc) {
        s rset=sql.%Execute()
        w !,"Query After Execute:  ",$zh
        i 'rset.%SQLCODE {
            s t1=$zh
            d rset.%Next()
            w $zh-t1,!
            d rset.%Display()
        }
        w !,"Query All Processed:  ",$zh
    } else {
        w $System.Status.GetErrorText(sc)
    }
}

/// d ##class(mp.test).Test()
ClassMethod Test()
{
    w !,"Before fill: ",$zh
    d ..Fill()
    w !,"After fill: ",$zh
    do ..Query()
}

}

Terminal Output:

DEV>d ##class(mp.test).Test()
 
Before fill: 84807.575139
After fill: 84807.575144
Query Before Prepare:  84807.575158
Query After Prepare:  84807.666853
Query After Execute:  84807.6669688.009005
ID      prop3
 
0 Rows(s) Affected
Query All Processed:  84815.676129

Try adding a new index and don't forget make rebuild index/tunetable/recompile class

Index idx On (prop1, prop3) [ Type = bitmap ];

Here yet need the help of @Kyle.Baxter.

PS: by the way, check

select count(*from mp.test where prop1=2

to insert the correct values in the code

%VID BETWEEN 3000000 AND 30000005

Because of this, is obtained

0 Rows(s) Affected

Try it with between 1 and 5 on the %VID. Just as slow. It is not about the 0 result, but about the time it takes.

The count on prop1=2 is around 2.7M/

Regarding the index:
Index idx On (prop1, prop3) [ Type = bitmap ];

I can't create an index on every possible filter and order by combination. The client selects what to filter on and what to order by.

We use single field indices to cater for this, and combined field indexes for uniqueness constraints. 

I'm not sure what you mean with "linear cost pagination", but as soon as your query plan gets more complex than reading a single stream (and especially when sorting) there is upfront work before you can start reading the first row so any pagination of that final result simply cannot be linear. 

That small disclaimer aside, "customizable filtering and sorting" sound like application-level features you'd apply on top of the result set returned by SQL and the %ScrollableResultSet class offers something along those lines. If you're looking for server-side support, IRIS SQL currently only supports the TOP clause, but we have support for LIMIT / OFFSET semantics queued for development.

I would go with an (old) approach for pagination:
1. Store only the IDs/page in a temporary table
2. For any specific page, get the IDs and query the data from the main table

The pagination class:

Class DB.TempSQL Extends %Library.Persistent [ ClassType = persistent, Not ProcedureBlock ]
{
Index Main On (Session, QueryName, PageNo) [ IdKey, Unique ];
Property IDs As list Of %String(TRUNCATE = 1);
Property PageNo As %Integer; Property QueryName As %String(TRUNCATE = 1);
Property Session As %String(TRUNCATE = 1);
Query PageNumbers(session As %String, queryname As %String) As %SQLQuery(CONTAINID = 0, ROWSPEC = "PageNo:%Integer")
{ SELECT PageNo FROM TempSQL WHERE (Session = :session) AND (QueryName = :queryname) }
 

The function to populate the pagination class:

ClassMethod BuildTempSql(SQL As %Integer, session As %String, queryname As %String, PageSize As %Integer, Special As %Boolean = 0) [ ProcedureBlock = 0 ]
{
(SQL,session,queryname,PageSize,Special,%session)
&sql(delete from DB.TempSQL Where Session = :session and QueryName = :queryname)
rs=##class(%ResultSet).%New()
rs.Prepare(SQL), rs.Execute()
(count,page,SQLCODE)=0,TimeStart=$P($ZTS,",",2)
entry=##class(DB.TempSQL).%New()
entry.Session=session,entry.QueryName=queryname
F  {
   I (rs.Next()'=1) {
   I (entry.IDs.Count()>0) {
      S page=$I(page),entry.PageNo=page entry.%Save()
      K entry
      }
   Quit   ; last one !
   }
   Else {
      I queryname'="Search"||'##class(Utils.Lists).CheckIBlack(rs.GetData(1)) {
         S count=$I(count) entry.IDs.Insert(rs.GetData(1))
         }
   I (count=PageSize) {
      S page=$I(page),entry.PageNo=page
      D entry.%Save() entry
      S count=0,entry=##class(DB.TempSQL).%New()
      S entry.Session=session,entry.QueryName=queryname
      }
   }
}
TimeEnd=$p($zts,",",2)
%session.Data("MaxPageNo"_$S(Special:queryname,1:""),1)=page
%session.Data("Page",1)=$S(page>0:1,1:0)
%session.Data("SqlSearchTime",1)=$FN(TimeEnd-TimeStart,",",3)
rs.Close() rs
}

Code for specific page data retrival:

##class(DB.TempSQL).%ExistsId(session_"||"_queryname_"||"_page) {
  
entry = ##class(DB.TempSQL).%OpenId(session_"||"_queryname_"||"_page)
   i=1:1:entry.IDs.Count() {
      
id = entry.IDs.GetAt(i)
       &sql(
select  ID, prop1, prop2, prop3 into :ID, :p1, :p2, :p3 from any.Table where ID = :id))
      }
   }

Looking at your example class, and maybe its not practical as we aren't seeing the entire context of your real world usage but might you see better resuilts if you were using instead of a composite index on prop1, prop2, prop3 but rahter bitmap indices on prop1 as it seems like it is the field name and would have a suffeceintly small number of values, a bitmap index on prop2, and a traditional index on prop3.  

If you were to show the query plan that would also provide some insight.

My view on this solution started as that is will not work practically, but the more I thought about the scenarios, the more feasible it looks. My thinking was as follows.

When is temp table built, when the page is opened? Because that will cause a waiting time for the first load, especially if no filters have been applied yet. But in this scenario, you can just select like the top 200 or something else, as long as it is not all the rows.

Then also, every time a filter changes or the page size selection changes, you will have to rebuild the temp table. This means your temp table's lookup needs to include filters and page size to determine that it has changed for the session. It is not in the solution above, but not difficult to implement.


If you hit the indexes correct and get a small result this may be useful.

What about REST APIs, which are usually built to end the session after each request.
This will not work for REST APIs requiring pagination. There can be worked around this by letting the front-end pass in a value for the session which is more related to the front-end session.

You will also need to build some cleanup mechanism to purge rows after a session has ended. The front-end can send you some instruction on a logout, but not if the browser is just closed. It will have to be a task that runs at night or some other random time and truncate it.