Cache Dynamic SQL Pagination
Would like to know if there is an alternative or better way to paginate through a dataset using dynamic SQL than what I am using below. The problem is that as the potential pool of data gets larger, this code slows down to the point of not being useable. In analyzing each line of code below, it appears the slow down is related to the initial rset.%Next() iteration. Is there anything available which does not require a subquery/%VID such as a simple LIMIT/OFFSET?
My code is similar to :
s sql=##class(%SQL.Statement).%New()
s query="SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=x) WHERE %VID BETWEEN 1 AND 100"
s sc=sql.%Prepare(query)
s rset=sql.%Execute()
while rset.%Next() {.....
See:
PS: need to replace the heading on "Speedup SQL pagination"
Thanks for the suggestions, but I'm afraid none of these will fix my problem.
- ScrollableResultSet has the most promise, but still does not satisfy what I need. The data is constantly being updated so using this would not be viable.
-The second suggestion is what I am doing above. Using %VID with a subquery is just too slow when dealing with large datasets. For ex. The following query has a potential pool of 1mil results. SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=x) WHERE %VID BETWEEN 1 AND 100. Eventhough I am only asking for the first 100, there is still a major performance hit when executing the first rset.%Next() due to, what I assume is, the code trying to find the 100 records I am requesting out of the 1 mil records.
- The third article mentions a few options of which some are not available in Cache, such as LIMIT and OFFSET. Also using Cursors or Keysets would not be viable as my application allows jump to, sorting and filtering functionality.
- And the 4th talks about using LIMIT and OFFSET which are, again, not available in Cache.
Any other thoughts?
So, in order.
Caché has analogues - TOP N and %vid, which with more than replace the LIMIT/OFFSET. In the second link this is discussed in detail.
The essence of the article is to replace query
to
In this case, the following query is sufficient:
Is there an index for "prop"?
You tuned the table?
I have all works quickly:
Result:
I'm sure I'm missing something, but so far, I'm just not seeing how this would speed up my query.
My reason for thinking the 4th article wouldn't work for me is because I am allowing the user to jump to a page, and with filtering and sorting also available (as well as a constantly changing database), I don't see how I could determine the keyval/id to use in the query.
Also, this query will work when grabbing the first page, but not for subsequent pages SELECT TOP 100 prop FROM table WHERE prop=?
The TOP N and %vid are very slow when dealing with a large result set.
The example you give is great, but I don't think it has 1mil possible matches which are being filtered down to 100 results (not 100% sure about that statement..:)). This is the problem I am having with using TOP N and %vid.
For example, say I want to grab all data for the month of July that match some specific search parameters (which happens to result in 1,000,000 matched items), and since I am currently on page 5 (and each page displays 100 results), I want to offset the results by 400 items. The first %Next() takes a tremendous performance hit.
s sql=##class(%SQL.Statement).%New()
s query="SELECT *,%VID FROM (SELECT prop FROM table WHERE prop=val) WHERE %VID BETWEEN 401 AND 500"
s sc=sql.%Prepare(query)
s rset=sql.%Execute()
s t=$zh d rset.%Next() w $zh-t
6.105871
Could you provide a complete example similar to mine (code + data + queries with sorting and filtering) that you have a issue with?
How much all rows in the table?
How much RAM?
Which version $zv?
Pardon the formatting but please see below. Conclusion - Every time I add 100,000 to the table, it takes longer to perform the initial %Next(), even though I'm still only asking for the first 5 results. This is a simple table with a simple query. The one I'm dealing with in my app is a much more complex table and query with an ORDER BY clause which slows it down even more, but the example below gives an idea of what I am talking about.
Class mp.test Extends (%Persistent)
{
Property prop1;
Property prop2;
Property prop3;
Index idx3 On (prop1, prop2, prop3);
ClassMethod Fill(total)
{
;d DISABLE^%NOJRN
;d ..%KillExtent()
s prop1="name"
s prop2="prop2"
f i=1:1:total {
s prop3=$r(30)
s id=$g(^mp.testD)+1
s ^mp.testD(id)=$lb(""," "_prop1," "_prop2," "_prop3)
s ^mp.testI("idx3",$$$SQLUPPER(prop1),$$$SQLUPPER(prop2),$$$SQLUPPER(prop3),id)=""
s ^mp.testD=$g(^mp.testD)+1
}
}
ClassMethod Query()
{
s sql=##class(%SQL.Statement).%New()
s query = "SELECT *,%vid FROM (SELECT %ID,prop3 FROM mp.test WHERE prop1='name' AND prop3>='1' AND prop3<='30' AND prop2='prop2') WHERE %VID BETWEEN 1 AND 5"
s sc=sql.%Prepare(query)
i $$$ISOK(sc) {
s rset=sql.%Execute()
i 'rset.%SQLCODE {
s t1=$zh
d rset.%Next()
w $zh-t1,!
d rset.%Display()
}
}
}
}
TEST 1 - Loading 100,000 items into table. Query takes .314 secs
user>D ##class(mp.test).Fill(100000)
user>D ##class(mp.test).Query()
.314167 secs
ID prop3 Literal_3
2 19 2
3 19 3
4 19 4
5 15 5
5 Rows(s) Affected
------------------------------------------------
TEST 2 - adding another 100,000 to table. Same query takes .64secs
user>D ##class(mp.test).Fill(100000)
user>D ##class(mp.test).Query()
.640992 secs
ID prop3 Literal_3
2 15 2
3 23 3
4 26 4
5 19 5
5 Rows(s) Affected
--------------------------------------------------------
TEST 3 - another 100,000. query takes .865 secs
user>D ##class(mp.test).Fill(100000)
user>D ##class(mp.test).Query()
.865654 secs
ID prop3 Literal_3
2 15 2
3 23 3
4 26 4
5 19 5
5 Rows(s) Affected
----------------------------------------
TEST 4 - another 100,000 takes 1.16secs
user>D ##class(mp.test).Fill(100000)
user>D ##class(mp.test).Query()
1.169932 secs
ID prop3 Literal_3
2 15 2
3 23 3
4 26 4
5 19 5
--------------------------------------------
TEST 5 - another 100,000. query takes 1.44 secs
user>D ##class(mp.test).Fill(100000)
user>D ##class(mp.test).Query()
1.44849 secs
ID prop3 Literal_3
2 15 2
3 23 3
4 26 4
5 19 5
5 Rows(s) Affected
And if so?
Whoa!!! Upon initial testing that is lightning fast!!! Went from +1s down to .001. I'm going to fool around with this some more but so far this is exactly what I was looking for. Thank you so much!!!!
is there a way to ORDER this? Say by prop3 DESC
I am having similar issues.
The speed is faster, but the order by is necessary.
Our table holds around 5M entries, with the select done as per above, it still takes 7 to 8 seconds to load each page, which is unacceptable for my client.
Have you tried this?
Yes. The top all kills the performance.
59.516 seconds from 7.8 seconds without the order by and top all.
Source code
USER>d ##class(mp.test).Fill(5000000)
USER>d ##class(mp.test).Query()
count=3833346
first 5 [1:5]
ID prop3
3 3
4 3
24 3
30 3
97 3
5 Rows(s) Affected
.000328 secs
last 5 [3833342:3833346]
ID prop3
4999798 1
4999817 1
4999836 1
4999866 1
4999947 1
5 Rows(s) Affected
2.884304 secs
PS: for those who put a minus. May I ask why?
Have you tried that with an order by on one of the fields?
When doing the order by, you need to select a top all on the inner query so that it can order by.
The sorting needs to happen before the %VID filters areapplied.
Example: Filter on a property that has a bitmap index for many records.
order by another property wich is an alphanumeric value, which is not unique.
I don't think I quite understand you.
That is, you need to:
If not, provide a ready-made example with data similar to the example by @MARK PONGONIS
Let's change the data and then test it with an order by:
Prop1 is i %Integer with valuelist of 1 to 10. Populated randomly, with 60% of the values being 2.
Prop2 is can stay as is and not relevant in this test.
Prop3 is a %String of length 20. Populate with 4M random values.
Prop1 has a bitmap index.
Prop2 has a bitmap index.
Prop3 has an index.
To use order by, it needs to look like this, and this is a lot slower:
select %ID,prop3
from mp.test
where %ID in
(SELECT * FROM
(SELECT TOP ALL %ID
FROM mp.test
WHERE prop1 = 2
ORDER BY prop3
WHERE %VID BETWEEN 3000000 AND 30000200
)
Class mp.test Extends (%Persistent, %Populate)
{
Property prop1 As %Integer(POPSPEC = ".PopulateProp1()", VALUELIST = ",1,2,3,4,5,6,7,8,9,10");
Property prop2 As %Boolean(POPSPEC = "Integer(0,1)");
Property prop3 As %String(MAXLEN = "", POPSPEC = "LastName()");
Index prop1Index On prop1 [ Type = bitmap ];
Index prop2Index On prop2 [ Type = bitmap ];
Index prop3Index On prop3;
ClassMethod Fill(total)
{
;d DISABLE^%NOJRN
;d ..%KillExtent()
;do ..Populate(5000000)
}
Method PopulateProp1() As %Integer
{
set tInt = $random(19) + 1
set:(tInt > 10) tInt = 2
return tInt
}
ClassMethod Query()
{
s sql=##class(%SQL.Statement).%New()
s query = "select %ID,prop3 from mp.test where %ID in (SELECT * FROM (SELECT TOP ALL %ID FROM mp.test WHERE prop1 = 2 ORDER BY prop3) WHERE %VID BETWEEN 3000000 AND 30000005)"
w !,"Query Before Prepare: ",$zh
s sc=sql.%Prepare(query)
w !,"Query After Prepare: ",$zh
i $$$ISOK(sc) {
s rset=sql.%Execute()
w !,"Query After Execute: ",$zh
i 'rset.%SQLCODE {
s t1=$zh
d rset.%Next()
w $zh-t1,!
d rset.%Display()
}
w !,"Query All Processed: ",$zh
} else {
w $System.Status.GetErrorText(sc)
}
}
/// d ##class(mp.test).Test()
ClassMethod Test()
{
w !,"Before fill: ",$zh
d ..Fill()
w !,"After fill: ",$zh
do ..Query()
}
}
Terminal Output:
DEV>d ##class(mp.test).Test()
Before fill: 84807.575139
After fill: 84807.575144
Query Before Prepare: 84807.575158
Query After Prepare: 84807.666853
Query After Execute: 84807.6669688.009005
ID prop3
0 Rows(s) Affected
Query All Processed: 84815.676129
Try adding a new index and don't forget make rebuild index/tunetable/recompile class
Here yet need the help of @Kyle.Baxter.
PS: by the way, check
to insert the correct values in the code
Because of this, is obtained
Try it with between 1 and 5 on the %VID. Just as slow. It is not about the 0 result, but about the time it takes.
The count on prop1=2 is around 2.7M/
Regarding the index:
Index idx On (prop1, prop3) [ Type = bitmap ];
I can't create an index on every possible filter and order by combination. The client selects what to filter on and what to order by.
We use single field indices to cater for this, and combined field indexes for uniqueness constraints.
I ran your code with new index.
count(prop1=2) = 2629833
%VID BETWEEN 1 AND 5
.000293 secs
%VID BETWEEN 2629829 AND 2629833
3.63472 secs
Waiting for @Kyle.Baxter response.
We really need to stop narrowing this). Filed.
So isn´t there a solution for linear cost pagination with customizable filtering and ordering?
THanks for the question @Rodrigo Werneck
Calling for @Benjamin De Boe and @Bob Kuszewski to help.
I'm not sure what you mean with "linear cost pagination", but as soon as your query plan gets more complex than reading a single stream (and especially when sorting) there is upfront work before you can start reading the first row so any pagination of that final result simply cannot be linear.
That small disclaimer aside, "customizable filtering and sorting" sound like application-level features you'd apply on top of the result set returned by SQL and the %ScrollableResultSet class offers something along those lines. If you're looking for server-side support, IRIS SQL currently only supports the TOP clause, but we have support for LIMIT / OFFSET semantics queued for development.
link to a solution with IRIS 2021.1+ with the use of the new window (OVER) function :
https://community.intersystems.com/post/scrollable-resultset-pagination-...
Social networks
InterSystems resources
Log in or sign up
Log in or create a new account to continue
Log in or sign up
Log in or create a new account to continue
Log in or sign up
Log in or create a new account to continue