#Indexing

0 Followers · 62 Posts

How to index data structures in databases.

Article Thomas Dyar · Jan 25 14m read

TL;DR: This article demonstrates how to run GraphRAG-style hybrid retrieval—combining vector similarity, graph traversal, and full-text search—entirely within InterSystems IRIS using the iris-vector-graph package. We use a fraud detection scenario to show how graph patterns reveal what vector search alone would miss.


Why Fraud Detection Needs Graphs

Every year, businesses and consumers lose billions to fraud. In 2024 alone, consumers reported $12.5 billion lost—a 25% increase year over year. What makes modern fraud so difficult to detect is that fraudsters rarely work alone.

0
0 17
Article Vachan C Rannore · Oct 21, 2025 3m read

Hello!!!

Data migration often sounds like a simple "move data from A to B task" until you actually do it. In reality, it is a complex process that blends planning, validation, testing, and technical precision.

Over several projects where I handled data migration into a HIS which runs on IRIS (TrakCare), I realized that success comes from a mix of discipline and automation.

Here are a few points which I want to highlight.

1. Start with a Defined Data Format.

Before you even open your first file, make sure everyone, especially data providers, clearly understands the exact data format you expect. Defining templates early avoids unnecessary bank-and-forth and rework later. 

While Excel or CSV formats are common, I personally feel using a tab-delimited text file (.txt) for data upload is best. It's lightweight, consistent, and avoids issues with commas inside text fields. 

PatID   DOB Gender  AdmDate
10001   2000-01-02  M   2025-10-01
10002   1998-01-05  F   2025-10-05
10005   1980-08-23  M   2025-10-15

Make sure that the date formats given in the file is correct and constant throughout the file because all these files are usually converted from an Excel file and an Basic excel user might make mistakes while giving you the date formats wrong. Wrong date formats can irritate you while converting into horolog.

12
2 231
Article Karthickraja S · Dec 18, 2025 2m read

The Power of Indexing in Database Tables

When working with databases, most developers understand the concept of an index and why it's used: to speed up data retrieval. But the real impact of indexing often becomes clear only when you compare scenarios with and without it.

Do you Know what Happens Without an Index?
Imagine a table with three columns: Name, Age, and MobileNumber.


Now, consider this query:

If the Age column does not have an index, the database engine will:

  • Check if the WHERE condition field has an index.
  • If not, it will scan the entire table (a full table scan).
3
2 114
Article Timothy Scott · Feb 28, 2025 7m read

High-Performance Message Searching in Health Connect

The Problem

Have you ever tried to do a search in Message Viewer on a busy interface and had the query time out? This can become quite a problem as the amount of data increases. For context, the instance of Health Connect I am working with does roughly 155 million Message Headers per day with 21 day message retention. To try and help with search performance, we extended the built-in SearchTable with commonly used fields in hopes that indexing these fields would result in faster query times. Despite this, we still couldn't get some of these queries to finish at all.

1
8 300
Article Harshitha · Oct 22, 2025 2m read

Hello community,

I wanted to share my experience about working on Large Data projects. Over the years, I have had the opportunity to handle massive patient data, payor data and transactional logs while working in an hospital industry. I have had the chance to build huge reports which had to be written using advanced logics fetching data across multiple tables whose indexing was not helping me write efficient code.

Here is what I have learned about managing large data efficiently.

Choosing the right data access method.

As we all here in the community are aware of, IRIS provides multiple ways to access data. Choosing the right method, depends on the requirement.

  • Direct Global Access: Fastest for bulk read/write operations. For example, if i have to traverse through indexes and fetch patient data, I can loop through the globals to process millions of records. This will save a lot of time.
Set ToDate=+H
Set FromDate=+$H-1 For  Set FromDate=$O(^PatientD("Date",FromDate)) Quit:FromDate>ToDate  Do
. Set PatId="" For  Set PatId=$Order(^PatientD("Date",FromDate,PatID)) Quit:PatId=""  Do
. . Write $Get(^PatientD("Date",FromDate,PatID)),!
  • Using SQL: Useful for reporting or analytical requirements, though slower for huge data sets.
6
1 186
Article Benjamin De Boe · Jun 19, 2025 10m read

This article describes a significant enhancement of how InterSystems IRIS deals with table statistics, a crucial element for IRIS SQL processing, in the 2025.2 release. We'll start with a brief refresher on what table statistics are, how they are used, and why we needed this enhancement. Then, we'll dive into the details of the new infrastructure for collecting and saving table statistics, after which we'll zoom in onto what the change means in practice for your applications. We'll end with a few additional notes on patterns enabled by the new model, and look forward to the follow-on phases of this initial delivery.

6
6 411
Article Guillaume Rongier · Jul 28, 2025 3m read

img

This will be a short article about Python dunder methods, also known as magic methods.

What are Dunder Methods?

Dunder methods are special methods in Python that start and end with double underscores (__). They allow you to define the behavior of your objects for built-in operations, such as addition, subtraction, string representation, and more.

Some common dunder methods include:

  • __init__(self, ...): Called when an object is created.
    • Like our %OnNew method in ObjectScript.
  • __str__(self): Called by the str() built-in function and print to represent the object as a
0
2 214
Article Joe Fu · Mar 7, 2025 2m read

We recently changed the 'UserID" property in a "User" class from type of %String to be %Library.Username. This is for better consistency across our codebase regarding MAXLEN limit.

%Library.Username is a system wrapper datatype which extends %String and has a MAXLEN of 160. This change should have minimal/no impact on code behavior. However, we found that some SQL query cannot return expected rows after the change. Query will return empty values even if the entry is in the table.

3
0 238
Article Robert Cemper · Jul 8, 2023 2m read

A recent question from @Vivian Lee reminded me of a rather ancient example.
It was the time when DeepSee's first version was released.
We got Bitmap Index.
And we got BitSlice Index: mapping a numeric value by its binary parts.
So my idea: Why not indexing strings by their characters?
The result of this idea was presented first in June 2008. 
IKnow wasn't publicly available at that time.

1
1 251
Article Timothy Leavitt · Feb 21, 2024 9m read

Suppose you have an application that allows users to write posts and comment on them. (Wait... that sounds familiar...)

For a given user, you want to be able to list all of the published posts with which that user has interacted - that is, either authored or commented on. How do you make this as fast as possible?

Here's what our %Persistent class definitions might look like as a starting point (storage definitions are important, but omitted for brevity):

Class DC.
3
5 560
Article Mihoko Iijima · Aug 31, 2023 1m read

InterSystems FAQ rubric

By specifying the start and end values ​​of the IDs for which you want to rebuild indexes in the arguments of the %BuildIndices() method provided in the persistent class (=table) definition, you can rebuild only the indexes within that range.

For example, to rebuild the NameIDX index and ZipCode index in the Sample.Person class only for ID=10 to 20, execute the following code (the ID range is specified in the 5th and 6th arguments). 

 set status = ##class(Sample.Person).%BuildIndices($LB("NameIDX","ZipCode"),1,,1,10,20

$LB() is the $ListBuild() function.

0
0 648
Article Mihoko Iijima · Jun 29, 2023 3m read

InterSystems FAQ rubric

For volatile tables (tables with many INSERTs and DELETEs), storage for bitmap indexes can become inefficient over time.

For example, suppose that there are thousands of data with the following definition, and the operation of bulk deletion with TRUNCATE TABLE after being retained for a certain period of time is repeatedly performed.

Class MyWork.
0
0 382
Article Timothy Leavitt · Jun 28, 2022 2m read

An interesting pattern around unique indices came up recently (in internal discussion re: isc.rest) and I'd like to highlight it for the community.

As a motivating use case: suppose you have a class representing a tree, where each node also has a name, and we want nodes to be unique by name and parent node. We want each root node to have a unique name too. A natural implementation would be:

Class DC.Demo.Node Extends %Persistent
{

Property Parent As DC.Demo.
8
0 1265
Article José Pereira · Feb 2, 2021 12m read

Image search like Google's is a nice feature that wonder me - as almost anything related to image processing.

A few months ago, InterSystems released a preview for Python Embedded. As Python has a lot of libs for deal with image processing, I decided to start my own attemptive to play with a sort of image search - a much more modest version in deed :-)


---

A tast of theory 🤓

In order to do an image search system, fist it's necessary select a set of features to be extracted from images - these features are also called descriptors.

0
0 445
Article Allyson Gerace · Feb 6, 2019 8m read

See Part 1 here.

Part 2: Index Handling

 

Now you have a good idea of what kind of indices you need for your class and how to define them. Next, how do you handle them?

 

Query Plan

 

(REMEMBER: Like any modifications to a class, adding indices in a live system has its risks – if users are accessing or updating data while an index is populated, they may encounter empty or incorrect query results, or even corrupt the indices that are being built.

1
0 1931
Article Allyson Gerace · Feb 6, 2019 13m read

This is the first in a pair of articles on SQL indices.

Part 1 - Know your indices

 

What is an index, anyway?

 

Picture the last time you went to a library. Typically they have books sorted by subject matter (and then author and title), and each shelf has an end-plate with a code describing the subject of its books. If you wanted to collect books of a certain subject, instead of walking across every aisle and reading the inside cover of every book, you could head straight for the bookshelf labelled with your desired subject matter and choose your books.

2
6 2280
Article Jean Millette · Aug 22, 2019 3m read

Our team is reworking an application to use REST services that use the same database as our current ZEN application. One of the new REST endpoints uses a query that ran very slowly when first implemented. After some analysis, we found that an index on one of the fields in the table greatly improved performance (a query that took 35 seconds was now taking a fraction of a second).

We saw this improvement on our development system and our test system. However, when we moved the code to the production system, the query still took “forever”. What went wrong?

4
0 573
Article Sergey Kamenev · Jul 7, 2017 7m read

In the previous parts (1, 2) we talked about globals as trees. In this article, we will look at them as sparse arrays.

A sparse array - is a type of array where most values assume an identical value.

In practice, you will often see sparse arrays so huge that there is no point in occupying memory with identical elements. Therefore, it makes sense to organize sparse arrays in such a way that memory is not wasted on storing duplicate values.

In some programming languages, sparse arrays are part of the language - for example, in J, MATLAB. In other languages, there are special libraries that let you use them. For C++, those would be Eigen and the like.

Globals are good candidates for implementing sparse arrays for the following reasons:

3
1 1593
Article Kyle Baxter · Sep 9, 2016 5m read

Have some free text fields in your application that you wish you could search efficiently?  Tried using some methods before but found out that they just cannot match the performance needs of your customers?  Do I have one weird trick that will solve all your problems?  Don’t you already know!?  All I do is bring great solutions to your performance pitfalls!

As usual, if you want the TL;DR (too long; didn’t read) version, skip to the end.  Just know you are hurting my feelings.

If you open up your version of Sample.

11
2 2830
Article Vitaliy Serdtsev · Jul 7, 2017 19m read

Quotes (1NF/2NF/3NF)ru:

Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else). The same value can be atomic or non-atomic depending on the purpose of this value. For example, “4286” can be
  • atomic, if its denotes “a credit card’s PIN code” (if it’s broken down or reshuffled, it is of no use any longer)
  • non-atomic, if it’s just a “sequence of numbers” (the value still makes sense if broken down into several parts or reshuffled)

This article explores the standard methods of increasing the performance of SQL queries involving the following types of fields: string, date, simple list (in the $LB format), "list of <...>" and "array of <...>".

0
0 1208
Article Michael Braam · Feb 20, 2017 14m read

Overview

Encryption of sensitive data becomes more and more important for applications. For example patient names, SSN, address-data or credit card-numbers etc..

Cache supports different flavors of encryption. Block-level database encryption and data-element encryption. The block-level database encryption protects an entire database.  The decryption/encryption is done when a block is written/read to or from the database and has very little impact on the performance.

With data-element encryption only certain data-fields are encrypted.  Fields that contain sensitive data like patient data or credit-card numbers. Data-element encryption is also useful if a re-encryption is required periodically. With data-element encryption it is the responsibility of the application to encrypt/decrypt the data.

Both encryption methods leverage the managed key encryption infrastructure of Caché.

The following article describes a sample use-case where data-element encryption is used to encrypt person data.  

But what if you have hundreds of thousands of records with an encrypted datafield and you have the need to search that field? Decryption of the field-values prior to the search is not an option. What about indices?

This article describes a possible solution and develops step-by-step a small example how you can use SQL and indices to search encrypted fields. 

9
2 1896
Article Benjamin De Boe · Jun 28, 2016 7m read

Earlier in this series, we've presented four different demo applications for iKnow, illustrating how its unique bottom-up approach allows users to explore the concepts and context of their unstructured data and then leverage these insights to implement real-world use cases. We started small and simple with core exploration through the Knowledge Portal, then organized our records according to content with the Set Analysis Demoorganized our domain knowledge using the Dictionary Builder Demo and finally build complex rules to extract nontrivial patterns from text with the Rules Builder Demo.

This time, we'll dive into a different area of the iKnow feature set: iFind. Where iKnow's core APIs are all about exploration and leveraging those results programmatically in applications and analytics, iFind is focused specifically on search scenarios in a pure SQL context. We'll be presenting a simple search portal implemented in Zen that showcases iFind's main features.

1
1 1295
Article Alexander Koblov · Jan 29, 2016 9m read

The object and relational data models of the Caché database support three types of indexes, which are standard, bitmap, and bitslice. In addition to these three native types, developers can declare their own custom types of indexes and use them in any classes since version 2013.1. For example, iFind text indexes use that mechanism.

1
1 2305