Article
· 12 hr ago 2m read
Tips on handling Large data

Hello community,

I wanted to share my experience about working on Large Data projects. Over the years, I have had the opportunity to handle massive patient data, payor data and transactional logs while working in an hospital industry. I have had the chance to build huge reports which had to be written using advanced logics fetching data across multiple tables whose indexing was not helping me write efficient code.

Here is what I have learned about managing large data efficiently.

Choosing the right data access method.

As we all here in the community are aware of, IRIS provides multiple ways to access data. Choosing the right method, depends on the requirement.

  • Direct Global Access: Fastest for bulk read/write operations. For example, if i have to traverse through indexes and fetch patient data, I can loop through the globals to process millions of records. This will save a lot of time.
Set ToDate=+H
Set FromDate=+$H-1 For  Set FromDate=$O(^PatientD("Date",FromDate)) Quit:FromDate>ToDate  Do
. Set PatId="" For  Set PatId=$Order(^PatientD("Date",FromDate,PatID)) Quit:PatId=""  Do
. . Write $Get(^PatientD("Date",FromDate,PatID)),!
  • Using SQL: Useful for reporting or analytical requirements, though slower for huge data sets.

1 6
0 44

Hello!!!

Data migration often sounds like a simple "move data from A to B task" until you actually do it. In reality, it is a complex process that blends planning, validation, testing, and technical precision.

Over several projects where I handled data migration into a HIS which runs on IRIS (TrakCare), I realized that success comes from a mix of discipline and automation.

Here are a few points which I want to highlight.

1. Start with a Defined Data Format.

Before you even open your first file, make sure everyone, especially data providers, clearly understands the exact data format you expect. Defining templates early avoids unnecessary bank-and-forth and rework later.

While Excel or CSV formats are common, I personally feel using a tab-delimited text file (.txt) for data upload is best. It's lightweight, consistent, and avoids issues with commas inside text fields.

PatID   DOB Gender  AdmDate
10001   2000-01-02  M   2025-10-01
10002   1998-01-05  F   2025-10-05
10005   1980-08-23  M   2025-10-15

Make sure that the date formats given in the file is correct and constant throughout the file because all these files are usually converted from an Excel file and an Basic excel user might make mistakes while giving you the date formats wrong. Wrong date formats can irritate you while converting into horolog.

2 8
1 56

Technical Documentation — Quarkus IRIS Monitor System

1. Purpose and Scope

This module enables integration between Quarkus-based Java applications and InterSystems IRIS’s native performance monitoring capabilities.
It allows a developer to annotate methods with @PerfmonReport, which triggers IRIS’s ^PERFMON routines automatically around method execution, generating performance reports without manual intervention.

1 1
0 32

This article describes a significant enhancement of how InterSystems IRIS deals with table statistics, a crucial element for IRIS SQL processing, in the 2025.2 release. We'll start with a brief refresher on what table statistics are, how they are used, and why we needed this enhancement. Then, we'll dive into the details of the new infrastructure for collecting and saving table statistics, after which we'll zoom in onto what the change means in practice for your applications. We'll end with a few additional notes on patterns enabled by the new model, and look forward to the follow-on phases of this initial delivery.

14 6
4 207

This article outlines the process of utilizing the renowned Jaeger solution for tracing InterSystems IRIS applications. Jaeger is an open-source product for tracking and identifying issues, especially in distributed and microservices environments. This tracing backend that emerged at Uber in 2015 was inspired by Google's Dapper and Twitter's OpenZipkin. It later joined the Cloud Native Computing Foundation (CNCF) as an incubating project in 2017, achieving graduated status in 2019. This guide will demonstrate how to operate the containerized Jaeger solution integrated with IRIS.

8 2
5 165

In IRIS, every time a request need to be processed, a specific IRIS process (IRISDB.EXE) need to be assigned to handle that request.

If there is no spare IRIS process at that time, process will need to be created (and later destroyed). On some Windows systems (especially with security/antimalware solutions being active) creating new processes can be slow, which can results in delays during peak times (due to inrush of requests).

0 2
0 67

Having been inspired with Shared code execution speed question/discussion, I dare to ask another one which is annoying me and my colleagues for several weeks.

We have a routine called Lib that comprises 200 $$-functions of 1500 code lines total. It was noticed that after calling _any_ function of another rather big routine (1900 functions, 32000 lines) the next call of $$someFunction^Lib(x) is getting 10-20% slower than previous call of the same function. This effect doesn't depend on:

0 16
0 245

Hey everyone,

I'm diving deeper into Caché ObjectScript and would love to open a discussion around the most useful tips, tricks, and best practices you’ve learned or discovered while working with it.

Whether you're an experienced developer or just getting started, ObjectScript has its own set of quirks and powerful features—some well-documented, others hidden gems. I’m looking to compile a helpful set of ideas from the community.

Some areas I’m especially interested in:

6 3
4 131

Introduction

MonLBL is a tool for analyzing the performance of ObjectScript code execution line by line. codemonitor.MonLBL is a wrapper based on the %Monitor.System.LineByLine package from InterSystems IRIS, designed to collect precise metrics on the execution of routines, classes, or CSP pages.

The wrapper and all examples presented in this article are available in the following GitHub repository: iris-monlbl-example

7 1
2 202

Let's suppose two different routines use one and the same chunk of code. From the object-oriented POV, a good decision is to have this chunk of code in a separate class and have both routines call it. However, whenever you call code outside of the routine as opposed to calling code in the same routine, some execution speed is lost. For reports churning through millions of transactions this lost speed might be noticeable. Any advice how to optimize specifically speed?

0 14
0 138

Motivation

I didn't know about ObjectScript until I started my new job. Objectscript isn't actually a young programming language. Compared to C++, Java and Python, the community isn't as active, but we're keen to make this place more vibrant, aren't we?

I've noticed that some of my colleagues are finding it tricky to get their heads around the class relationships in these huge projects. There aren't any easy-to-use modern class diagram tool for ObjectScript.

Related Work

I have tried relavant works:

12 7
4 374

Introduction

Database performance has become a critical success factor in a modern application environment. Therefore identifying and optimizing the most resource-intensive SQL queries is essential for guaranteeing a smooth user experience and maintaining application stability.

This article will explore a quick approach to analyzing SQL query execution statistics on an InterSystems IRIS instance to identify areas for optimization within a macro-application.

Rather than focusing on real-time monitoring, we will set up a system that collects and analyzes statistics pre-calculated by IRIS once an hour. This approach, while not enabling instantaneous monitoring, offers an excellent compromise between the wealth of data available and the simplicity of implementation.

We will use Grafana for data visualization and analysis, InfluxDB for time series storage, and Telegraf for metrics collection. These tools, recognized for their power and flexibility, will allow us to obtain a clear and exploitable view.

More specifically, we will detail the configuration of Telegraf to retrieve statistics. We will also set up the integration with InfluxDB for data storage and analysis, and create customized dashboards in Grafana. This will help us quickly identify queries requiring special attention.

To facilitate the orchestration and deployment of these various components, we will employ Docker.

logos.png

6 0
3 257

High-Performance Message Searching in Health Connect

The Problem

Have you ever tried to do a search in Message Viewer on a busy interface and had the query time out? This can become quite a problem as the amount of data increases. For context, the instance of Health Connect I am working with does roughly 155 million Message Headers per day with 21 day message retention. To try and help with search performance, we extended the built-in SearchTable with commonly used fields in hopes that indexing these fields would result in faster query times. Despite this, we still couldn't get some of these queries to finish at all.

17 0
6 139

Here at InterSystems, we often deal with massive datasets of structured data. It’s not uncommon to see customers with tables spanning >100 fields and >1 billion rows, each table totaling hundred of GB of data. Now imagine joining two or three of these tables together, with a schema that wasn’t optimized for this specific use case. Just for fun, let’s say you have 10 years worth of EMR data from 20 different hospitals across your state, and you’ve been tasked with finding….

7 1
3 216

The rise of Big Data projects, real-time self-service analytics, online query services, and social networks, among others, have enabled scenarios for massive and high-performance data queries. In response to this challenge, MPP (massively parallel processing database) technology was created, and it quickly established itself. Among the open-source MPP options, Presto (https://prestodb.io/) is the best-known option. It originated in Facebook and was utilized for data analytics, but later became open-sourced.

6 0
3 276