Tips on handling Large data

Article

Harshitha · 2 hr ago 2m read

#Globals #Indexing #JSON #Performance #SQL #XML #Caché #HealthShare #InterSystems IRIS

Hello community,

I wanted to share my experience about working on Large Data projects. Over the years, I have had the opportunity to handle massive patient data, payor data and transactional logs while working in an hospital industry. I have had the chance to build huge reports which had to be written using advanced logics fetching data across multiple tables whose indexing was not helping me write efficient code.

Here is what I have learned about managing large data efficiently.

Choosing the right data access method.

As we all here in the community are aware of, IRIS provides multiple ways to access data. Choosing the right method, depends on the requirement.

Direct Global Access: Fastest for bulk read/write operations. For example, if i have to traverse through indexes and fetch patient data, I can loop through the globals to process millions of records. This will save a lot of time.

Set ToDate=+H
Set FromDate=+$H-1 For  Set FromDate=$O(^PatientD("Date",FromDate)) Quit:FromDate>ToDate  Do
. Set PatId="" For  Set PatId=$Order(^PatientD("Date",FromDate,PatID)) Quit:PatId=""  Do
. . Write $Get(^PatientD("Date",FromDate,PatID)),!

Using SQL: Useful for reporting or analytical requirements, though slower for huge data sets.

Streamlining Bulk Operations

Handling millions of records one by one is slow and heavy. To optimize, I have found that saving in batches, using temporary globals for intermediate steps and breaking large jobs into smaller chunks will make a huge difference. Turning off non-essential indices during bulk inserts will also speed up things.

Using Streams

For large text, XML or JSON payloads, Stream objects prevent memory overload. Dealing with huge files can consume a lot of memory if we are loading everything at once. I would prefer stream objects to read or write the data in chunks. This will keep things fast and efficient.

Set stream = ##class(%Stream.GlobalCharacter).%New()
Do stream.CopyFromFile("C:\Desktop\HUGEDATA.json")
w "Size: "_stream.Size(),!

This will be a simple way of handling data safely without slowing down the system.

Soooo ya. Handling huge data is not just about making things fast, it's about choosing the right way to access, store and keep the system balanced smartly.

From migrating millions of patient records to building APIs that handle quite large datasets, these approaches have made a real difference in performance and maintainability.

If you are working with similar concepts and want to swap ideas, please feel free to reach out, I am always happy to share what has worked with me. Open for feedbacks and your opinions too...

Thanks!!! :-)

Vachan C Rannore · 2 hr ago

Hi Harshitha, a good read. But won't using loops while accessing globals directly have more time complexity?

1 0

Harshitha · 2 hr ago

Yes Vachan. thanks for pointing that out! True, simple loops over globals can get slow if you are not careful. In my experience, combining direct global access with $Order or streaming techniques keeps things efficient.

So yes, loops always add complexity, but done the right way, it can actually be faster than going through class/SQL for large datasets.

Yaron Munz · 3 min ago

for sure direct global access, especially looping on indexes is faster than SQL. It seems that the index global you used: ^PatientD("Date",FromDate)) should be: ^PatientI("Date",FromDate))

Using queue manager (for each "FromDate") will do this in parallel which will speed things much more!

0 0