Since Caché 2017 the SQL engine has included new set of statistics. These record the number of times a query is executed and the time it takes to run.
This is a gold mine for anyone monitoring and trying to optimize the performance of an application that includes many SQL statements but it isn’t as easy to access the data as some people want.
This article and the associated sample code explains how to use this information and how to routinely extract a summary of daily statistics and keep a historic record of the SQL performance of your application.
What is recorded?
Every time an SQL statement is executed the time taken is recorded. This is very light weight and you can’t turn it off. To minimize cost, the statistics are kept in memory and written to disk periodically. The data includes the number of times a query has been executed in the day and the average and total time taken.
Data is not written to disk immediately, and after it has been written, the statistics are updated by a task “Update SQL query statistics” that is usually scheduled to run hourly. That task can be fired off manually, but if you want to see the statistics in real time while testing a query, the whole process requires a little patience.
Warning: In InterSystems IRIS 2019 and earlier these statistics are only collected on a system where the SQL is compiled. So it doesn't work for embedded SQL in deployed classes or routines. Nothing will break with the sample code, but it might fool you (it fooled me) into thinking everything was OK because nothing showed up as expensive.
How do you see the information?
You can see the list of queries in the management portal. Go to the SQL page and click on the ‘SQL Statements’ tab. This is good for a new query you are running and looking at; but if there are thousands of queries running that can become unmanageable.
The alternative is to use SQL to search for the queries. The information is stored in tables in the INFORMATION_SCHEMA schema. This schema has a number of tables and I have include some sample SQL queries at the end of this article.
When are the statistics removed?
The data for a query is removed every time it is recompiled. So for dynamic queries, that might mean when the cached queries are purged. For embedded SQL that means when the class or routine where the SQL is embedded is recompiled.
On a live site it is reasonable to expect that the statistics will be kept for more than a day, but the tables holding the statistics cannot be used as a long term reference source for running reports or long term analysis.
How can you summarize the information?
I recommend extracting the data every night into permanent tables that are easier to work with when generating performance reports. There is a chance that some information is lost if classes are compiled during the day but this is unlikely to make any real difference to your analysis of slow queries.
The code below is an example of how you could extract the statistics into a daily summary for each query. It includes three short classes:
- A task that should be run every night.
- DRL.MonitorSQL is the main class that extracts and stores the data from the INFORMATION_SCHEMA tables.
- The third class DRL.MonitorSQLText is an optimization that stores the (potentially long) query text once and only stores the hash of the query in the statistics for each day.
Notes on the sample
The task extracts the previous day and should therefore be scheduled shortly after midnight.
You can export more historic data if it exists. To extract the last 120 days
The sample code reads the ^rIndex global directly because the earliest versions of the statistics did not expose the Date to SQL.
The variation I have included loops through all namespaces in the instance but that is not always appropriate.
How to query the extracted data
Once the data is extracted, you can find the heaviest queries by running
SELECT top 20 S.RunDate,S.RoutineName,S.TotalHits,S.SumpTIme,S.Hash,t.QueryText from DRL.MonitorSQL S left join DRL.MonitorSQLText T on S.Hash=T.Hash where RunDate='08/25/2019' order by SumpTime desc
Also if you choose the hash for an expensive query you can see the history of that query with
SELECT S.RunDate,S.RoutineName,S.TotalHits,S.SumpTIme,S.Hash,t.QueryText from DRL.MonitorSQL S left join DRL.MonitorSQLText T on S.Hash=T.Hash where S.Hash='CgOlfRw7pGL4tYbiijYznQ84kmQ=' order by RunDate
Earlier this year I data from a live site and i looked at the most expensive queries. One query was averaging less than 6 seconds but was being called 14,000 times per day adding up to nearly 24 hours elapsed time every day. Effectively one core was fully occupied with this one query. Worse still, the second query that takes an hour was a variation on the first.
DECLARE C CURSOR FOR SELECT * INTO :%col(1) , :%col(2) , :%col(3) , :%col(4) …
DECLARE C CURSOR FOR SELECT * INTO :%col(1) , :%col(2) , :%col(3) , :%col(4) , …
DECLARE C CURSOR FOR SELECT * INTO :%col(1) , :%col(2) , :%col(3) , :%col(4) , :%col(5) , :%col(6) , :%col(7) ,
DECLARE C CURSOR FOR SELECT * , TKGROUP INTO :%col(1) , :%col(2) , :%col(3) , ..
DECLARE C CURSOR FOR SELECT …
Table 1: Actual results from customer site
Tables in INFORMATION_SCHEMA schema
As well as the statistics, the tables in this schema keep track of where queries, columns, indices etc. are used. Typically the SQL statement is the starting table and it is joined on something like “Statements.Hash=OtherTable.Statement”.
The equivalent query for accessing these tables directly to find the most expensive queries for one day would be…
SELECT DS.Day,Loc.Location,DS.StatCount,DS.StatTotal,S.Statement,S.Hash FROM INFORMATION_SCHEMA.STATEMENT_DAILY_STATS DS left join INFORMATION_SCHEMA.STATEMENTS S on S.Hash=DS.Statement left join INFORMATION_SCHEMA.STATEMENT_LOCATIONS Loc on S.Hash=Loc.Statement where Day='08/26/2019' order by DS.stattotal desc
Whether you are considering setting up a more systematic process or not, I recommend that everyone with a large application that uses SQL runs this query today.
If a particular query shows up as expensive, you can get the history by running
SELECT DS.Day,Loc.Location,DS.StatCount,DS.StatTotal,S.Statement,S.Hash FROM INFORMATION_SCHEMA.STATEMENT_DAILY_STATS DS left join INFORMATION_SCHEMA.STATEMENTS S on S.Hash=DS.Statement left join INFORMATION_SCHEMA.STATEMENT_LOCATIONS Loc on S.Hash=Loc.Statement where S.Hash='jDqCKaksff/4up7Ob0UXlkT2xKY=' order by DS.Day