A Peek at IRIS Data Platform Performance Strategies

Article

Nicole Sun · Mar 25, 2024 7m read

#Best Practices #Tips & Tricks #Vector Search #InterSystems IRIS

In the business world, every second counts, and having high-performing applications is essential for streamlining our business processes. We understand the significance of crafting efficient algorithms, measurable through the big O notation.

Nevertheless, there are numerous strategies to boost the performance of systems built on the IRIS Data Platform. These strategies are equally crucial for optimizing overall efficiency.

Let's join the journey for a sneak peek into the tips for making IRIS Data Platform work better, where every little trick will help your applications shine.

1. Using Indexes

Indexing serves as a means to optimize queries by maintaining an organized subset of frequently requested data. Within the IRIS Data Platform, various index types cater to specific needs:

Standard Indexes: These are persistent arrays associating indexed values with the RowID(s) of the corresponding rows.

Example:

Index NameIDX ON Name;

Bitmap Indexes: A unique index type utilizing bitstrings to represent sets of RowID values corresponding to a given indexed value.

Example:

Index RegionIDX ON Region [Type = bitmap];

Bitslice Indexes: This special index allows rapid evaluation of specific expressions, such as sums and range conditions.

Example:

Index SalaryIDX ON Salary [Type = bitslice];

Columnar Indexes: Specifically designed for very fast queries, especially those involving filtering and aggregation operations, on columns with data stored across rows.

Example:

Index AmountIDX ON Amount [Type = columnar];

2. Query Plan

We want to ensure that our queries utilize defined indexes. However, sometimes they do not get executed as expected. For instance, if 'ColumnName IS NOT NULL' is used in the query's WHERE clause, even if an index is defined against ColumnName, it will skip the index. Therefore, it is not recommended to use 'ColumnName IS NOT NULL' if ColumnName has its index defined. How can we determine if the query uses the index map or not? The Query Plan is the tool we can use to check if the query utilizes the index map or simply traverses the entire master map.

How to use query plan?

Run Show Plan either with the SQL EXPLAIN command or the Show Plan option in the Management Portal ->System Explore->SQL, then follow to the first map. If the first bullet item in the Query Plan is “Read master map”, or the Query Plan calls a module whose first bullet item is “Read master map”, the query first map is the master map rather than an index map. Because the master map reads the data itself, rather than an index to the data, this almost always indicates an inefficient Query Plan. Unless the table is relatively small, we should create an index so that when we rerun this query the Query Plan first map says “Read index map.”

3. Query Optimizer and Tune Table

When determining the optimal execution strategy for a given SQL query, the Query Optimizer takes into account three key factors:

ExtentSize: row count for each table used within the query.
Selectivity: the percentage of distinct values calculated for each column used by the query.
BlockCount: count for each SQL map used by the query.

These statistics can be specified in the persistent class storage definition.

To guarantee accurate decision-making by the Query Optimizer, it is crucial to set these values correctly.

We have the option to explicitly define any of these statistics when creating a class (table) before inserting data into the table.
Following the population of the table with representative data, we can utilize Tune Table to compute these statistics.
Subsequent to running Tune Table, we can replace a calculated statistic by specifying an explicit value.

We can assess the statistics we have explicitly defined against the results generated by Tune Table. If Tune Table's assumptions prove less than optimal for the Query Optimizer, choosing an explicitly set statistic over the one generated by Tune Table becomes a viable alternative.

What is Tune Table?

Tune Table is a utility designed to analyze the data within a table, providing insights into ExtentSize, the distribution of distinct values in each field, and the Average Field Size (average length of values in each field). Additionally, it computes the BlockCount for each SQL map. We have the option to instruct Tune Table to leverage this information for updating the metadata associated with a table and its fields. Subsequently, the query optimizer utilizes these statistics to determine the most efficient execution plan for a query.

It is recommended to run Tune Table on a table after populating it with a representative volume of actual data. Typically, running Tune Table once, as a final step in application development before the data goes live, is sufficient. In certain scenarios, IRIS automatically executes Tune Table the first time a SELECT query is performed on a table.

However, there are also manual ways to run Tune Table:

Using the Management Portal SQL interface Actions drop-down list.
Invoking the $SYSTEM.SQL.Stats.Table.GatherTableStats() Opens in a new tab method for a single table, or all tables in the current namespace.
Issuing the SQL command TUNE TABLE for a single table.

4. Columnar storage

In columnar storage, primary data is stored in one global per column. Sequences of 64,000 data elements are stored in separate global subscripts. Data is encoded using a vector encoding that is optimized for storing elements of the same data type. In general, analytical queries run quickly but transactions might be slower.

When shall we choose to use columnar storage to enhance the performance?

Filtering and aggregating data in specific columns to perform analytical queries (OLAP).
Data are not frequently updated, inserted, and deleted or data updated in bulk.

5. Avoiding frequently opening objects

Frequently opening objects can slow the application process down. Therefore, we should combine the opening of identical objects whenever possible and when it makes logical sense.

When we need to return an object property value, we can use ##(ClassName).PropertyGetStored(id). This built-in method is faster than using object.Property after opening an object by %OpenId().

6. Using Work Queue Manager

When there is a substantial process that needs to be completed, if certain parts of the process can run concurrently, it is advisable to consider a parallel processing system.

The Work Queue Manager allows us to enhance performance by programmatically distributing work to multiple concurrent processes.

How to use Work Queue Manager?

Set queue = $system.WorkMgr.Initialize("/multicompile=1",.status)
For i=1:1:100{
  Set status = queue.Queue("##class(ClassName).ClassMethod",i)
}
If status =1 Set status = queue.WaitForComplete()
If 'status {
  Do $system.Status.DisplayError(status)
}

7. Performance Monitoring Tools

There are a few of system monitoring tools available in the IRIS Data Platform. Here, we will take a quick look at ^%SYS.MONLBL and ^SystemPerformance.

^%SYS.MONLBL is a line-by-line monitor, providing a way to diagnose where time is spent executing selected code in routines. This utility allows us to monitor and identify which part of the code has a performance problem.

To start the monitor, use

%SYS>Do ^%SYS.MONLBL

^SystemPerformance is a system snapshot tool for collecting detailed performance data about an IRIS Data Platform instance and the platform on which it is running. The resulting report can aid in diagnosing system problems and can be run in the terminal or in the Management Portal. By default, the output directory for the report is the install-dir\mgr directory of the IRIS instance.

To start the monitor, use

%SYS>do ^SystemPerformance

To stop a running profile and abort the collected data, use

%SYS>do Stop^SystemPerformance(runid)

Alternatively, to stop the job without deleting log files and produce an HTML performance report from those log files, use

%SYS>do Stop^SystemPerformance(runid, 0)

8. Checking Performance by Counting Globals

We can utilize the following class method, which returns the number of global references made by a specified process: ##class(%SYSTEM.Process).GlobalReferences($JOB)

To achieve the performance monitoring purpose, we can run this class method at the beginning and the end of the process to check the amount of global accesses during this process being executed. The more globals being accessed, the slower the process would be.

Conclusion

As we conclude this journey, I hope you have found these insights valuable for enhancing your applications' performance. Feel free to implement these tips and witness the positive impact on your systems. For a deeper dive into these strategies and to uncover more valuable performance enhancement insights, explore our comprehensive online documentation. Thank you for joining on this exploration, and may your applications continue to thrive on the IRIS Data Platform.

More Performance Improvement Materials: (Thanks to @Vitaliy Serdtsev and @Benjamin De Boe )

Optimizing SQL^IRIS (Best Practices for Improving SQL Performance, Configure SQL Performance Options, etc.)
Performance and Programming Considerations^Caché (for ECP)
Performance tips for Business Intelligence^IRIS (DeepSee)
Performance&Security Tips and Information^IRIS (for Persistent Classes)
2021.2 SQL Feature Spotlight - Smart Sampling & Automation for Table Statistics
CI/CD with IRIS SQL

Stephen Canzano · Mar 27, 2024

Regarding

If the first bullet item in the Query Plan is “Read master map”, or the Query Plan calls a module whose first bullet item is “Read master map”, the query first map is the master map rather than an index map. Because the master map reads the data itself, rather than an index to the data, this almost always indicates an inefficient Query Plan. Unless the table is relatively small, we should create an index so that when we rerun this query the Query Plan first map says “Read index map.”

I think it's more subtle than that. My general plan of attack is to review the results of Show Plan and then search for Looping. If the first bullet item in Show Plan is one of the following

Read master map Ens.MessageHeader.IDKEY, using the given idkey value.
Read master map Ens.MessageHeader.IDKEY, looping on ID (with a range condition).

I'm not immediately concerned. In the first case this is going directly to the row which is perfectly fine.

In the second case so long as the range condition is not going to read the entire extent I can accept that and not look for a better query plan.

Honestly, where its

Read master map Ens.MessageHeader.IDKEY, looping on ID (with a range condition).

Read index map Ens.MessageHeader.TimeCreated, looping on TimeCreated (with a range condition) and ID.

I don't care if its the master map or the index map, what I'm interested in is Looping and does looping cause the engine to look at the entire extent or index.

1 0

Andrew Aho · Dec 11, 2023

Nice overview, thanks @Nicole Sun!

Ben Spead · Mar 25, 2024

@Nicole Sun - thank you for putting this together ... it's a great overview on which to structure a broad understanding of InterSystems IRIS performance considerations :)

Vitaliy Serdtsev · Mar 26, 2024

I think it will also be useful to mention here the links to the documentation:

Optimizing SQL^IRIS (Best Practices for Improving SQL Performance, Configure SQL Performance Options, etc.)
Performance and Programming Considerations^Caché (for ECP)
Performance tips for Business Intelligence^IRIS (DeepSee)
Performance&Security Tips and Information^IRIS (for Persistent Classes)

Nicole Sun · Apr 3, 2024

Thanks Vitaliy!

0 0

thanks Vitaliy!

Thanks for sharing your thoughts and experience. Looping is something important too!

In terms of the mast map vs index map, it really depends on the cases, if there are large amount of data in the table, looping through a master map can be much less efficient than looping through a index map, ideally an index map should have a lot less data to go through. Query plan gives some idea if the query itself hits the designed indecies or not, and to see whether the indices are built as expected to help the proformance or not. If neigher, maybe it's a good idea to improve the indecies or optimise the query itself.

Benjamin De Boe · Apr 3, 2024

Nice article!

as a complement to the section on Tune Table, I'd like to refer to this article I wrote about a bit of automation we put in the product in 2021.2 (that we intend to enhance this year), and also this one on caveats wrt packaging statistics

Great atciles of yours! Thanks for sharing!

Developer Commu... · Oct 11, 2024

💡 This article is considered InterSystems Data Platform Best Practice.