SQL query with Count function running slow

Question

Question

Jeremy Forsyth · May 10, 2019

#Indexing #ODBC #Performance #SQL #Caché

Cache version: Cache for Windows (x86-64) 2017.2.1 (Build 801_3U)

Good Afternoon,

I have a co-worker who is trying to run the below query via ODBC. The issue is that the query appears to be running extremely slow (nearly 2 hours).

SELECT A.RecNo, T.SDSInstID, T.TranEffectDate, COUNT(T.InternalTXID) as NoofTransactions

FROM SDS_DATA._Transaction T

INNER JOIN SDS_DATA.DataFeed A

on A.BankAccountBSB = T.BankAccountBSB

and A.BankAccountNo = T.BankAccountNo

WHERE T.TranEffectDate >= DATEADD(yy,-1,CURRENT_DATE)

GROUP BY A.RecNo, T.SDSInstID, T.TranEffectDate

Below is the generated query plan

Statement Text

DECLARE QRS CURSOR FOR SELECT A . RecNo , T . SDSInstID , T . TranEffectDate , COUNT ( T . InternalTXID ) AS NoofTransactions FROM SDS_DATA . _Transaction T INNER JOIN SDS_DATA . DataFeed A ON A . BankAccountBSB = T . BankAccountBSB AND A . BankAccountNo = T . BankAccountNo WHERE T . TranEffectDate >= DATEADD ( yy , - ? , CURRENT_DATE ) GROUP BY A . RecNo , T . SDSInstID , T . TranEffectDate

Query Plan

Relative cost = 119327452

Call module C, which populates temp-file B.

Read temp-file B, looping on the hashing subscript.

For each row:
Output the row.

module C

Call module B, which populates bitmap temp-file A.

Read bitmap temp-file A, looping on ID.

For each row:
Read master map SDS_Data._Transaction.IDKEY, using the given idkey value.
Read index map SDS_Data.DataFeed.BSBAccountNoIndex, using the given %SQLUPPER(BankAccountBSB) and %SQLUPPER(BankAccountNo), and looping on ID.
For each row:
Read master map SDS_Data.DataFeed.IDKEY, using the given idkey value.
Check distinct values for RecNo, %SQLUPPER(SDSInstID), and TranEffectDate using temp-file B,
subscripted by a hashing of these values.
For each distinct row:
Add a row to temp-file B, subscripted by the hashing,
with node data of InternalTXID, TranEffectDate, %SQLUPPER(SDSInstID), and RecNo.
Update the accumulated count(InternalTXID) in temp-file B,
subscripted by the hashing.

module B

Read bitmap index SDS_Data._Transaction.TranEffectDateIndex, looping on TranEffectDate and bitmap chunks.

For each bitmap chunk:
OR the bitmap chunk into bitmap temp-file A.

Numbers wise there are approximately 50,611 datafeed records and 12,098,782 transaction records. The transaction table gorws by approx. 50,000 records a day.

Would anyone here have any advice as to how I can increase performance for this query?

Thanks in advance,

Jeremy

Discussion (3)1

Log in or sign up to continue

Dmitry Maslennikov · May 10, 2019

With such amount of data, does you have enough global buffers configured?

Is it only one slow place, or you have any others as well?

How big is your database, indexes globals and globals buffer?

0 0

Yaron Munz · May 10, 2019

Hello,

1. A deeper look on how you implement ed your indexes According to the "cost" it looks that the query is not optimized.

2. Do you have an index on "TranEffectDate" on "SDS_DATA._Transaction" table ?

3. I would recommend that you run this query (after improve of the "cost") with parallel processing.

0 0

Christopher Eslinger · May 10, 2019

In 2017 and above you can use %Parallel mentioned here: https://docs.intersystems.com/csp/docbook/DocBook.UI.Page.cls?KEY=GSQLOPT_optquery#GSQLOPT_optquery_parallel

Off the top of my mind I'm not positive how to incorporate into your query, but I have achieved enormous performance improvements using %Parallel in sub queries so it may be worth a few simple 'show plan' tests. As mentioned above ensuring you have the proper indexes would be my first step here as well.

0 0