#Big Data

1 Follower · 46 Posts

Big data is a field that treats ways to analyze, systematically extract information from. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source.

Learn more.

All

Top

By update

Article Benjamin De Boe · Apr 2 2m read

Dealing with Very Large Data in IRIS 2026.1

What’s New in InterSystems IRIS and IRIS for Health 2026.1

InterSystems IRIS 2026.1 is here, and it’s packed with powerful enhancements designed to help organizations scale their data management like never before. Whether you’re dealing with the operational aspects of managing massive datasets or looking to optimize storage costs, this release brings a host of features to simplify life with your data and meet the growing challenges of very large datasets.

#InterSystems IRIS #Big Data #Databases #Relational Tables #SQL

6 0

0 84

Article Yuri Marx · Nov 18, 2025 11m read

Data Streaming with InterSystems IRIS Interoperability

Modern data architectures utilize real-time data capture, transformation, movement, and loading solutions to build data lakes, analytical warehouses, and big data repositories. It enables the analysis of data from various sources without impacting the operations that use them. To achieve this, establishing a continuous, scalable, elastic, and robust data flow is essential. The most prevalent method for that is through the CDC (Change Data Capture) technique. CDC monitors for small data set production, automatically captures this data, and delivers it to one or more recipients, including analytical data repositories. The major benefit is the elimination of the D+1 delay in analysis, as data is detected at the source as soon as it is produced, and later is replicated to the destination.

This article will demonstrate the two most common data sources for CDC scenarios, both as a source and a destination. For the data source (origin), we will explore the CDC in SQL databases and CSV files. For the data destination, we will use a columnar database (a typical high-performance analytical database scenario) and a Kafka topic (a standard approach for streaming data to the cloud and/or to multiple real-time data consumers).

Overview

This article will provide a sample for the following interoperability scenario:

#InterSystems IRIS #Analytics #Big Data #Business Operation #Business Service #Columnar Storage #Dashboards #Interoperability

Open Exchange

10 0

4 248

Article Timothy Scott · Feb 28, 2025 7m read

High-Performance Message Searching in Health Connect

The Problem

Have you ever tried to do a search in Message Viewer on a busy interface and had the query time out? This can become quite a problem as the amount of data increases. For context, the instance of Health Connect I am working with does roughly 155 million Message Headers per day with 21 day message retention. To try and help with search performance, we extended the built-in SearchTable with commonly used fields in hopes that indexing these fields would result in faster query times. Despite this, we still couldn't get some of these queries to finish at all.

#Health Connect #Big Data #HL7 #Indexing #Interoperability #Message Search #Performance #SQL #Tips & Tricks

22 1

8 391

Question Kanishk Mittal · Jul 28, 2025

Schema Design Best Practices for Cross-Departmental Data Lakes in IRIS

We’re building out a data lake in IRIS 2025.1 that aggregates data across multiple business systems and departments. I’m trying to establish best practices for schema design and separation.

Right now, I’m thinking of using a separate schema for each distinct system of record feeding into the data lake - for example, one schema per upstream source system, rather than splitting based on function (e.g. staging, raw, curated). The idea is that this would make it easier to manage source ownership, auditing, and pipeline logic, especially when multiple domains are contributing data.

#InterSystems IRIS #InterSystems IRIS BI (DeepSee) #Access control #Big Data #Databases #Unstructured Data

1 0

0 118

Discussion Raj Singh · Apr 10, 2025

Are you using Jupyter Notebooks with IRIS?

Are you using Jupyter Notebooks with IRIS? Are you using the vscode-iris-jupyter-server VS Code extension for your notebooking? If so, please let me know either via direct message or with a comment on this post. I'd like to hear more about how our customers are working with tool specifically, and with data science more generally.

Thanks!

#InterSystems IRIS #Big Data #Embedded Python #Python #VSCode

2 1

0 177

Question Michael Davidovich · Apr 10, 2025

Performant SQL For Paging Results (DataTables, Select2, etc.)

Hello,

Our software commonly returns a full result set to the client and we use the DataTables plugin to display table data. This has worked well, but at datasets grow larger, we are trying to move some of these requests server-side so the server handles the bulk of the work rather than the client. This has had me scratching my head in so many ways.

I'm hoping I can get a mix of general best practice advice but also maybe some IRIS specific ideas.

#InterSystems IRIS #InterSystems IRIS for Health #Big Data #Databases #Data Model #JavaScript #SQL

0 6

0 262

Article sween · Jul 23, 2024 4m read

Databricks Station - InterSystems Cloud SQL

A Quick Start to InterSystems Cloud SQL Data in Databricks

#InterSystems IRIS #Big Data #Cloud #Java #JDBC #Python #SQL #SSL

5 2

1 584

Question Steven Henry Suhendra · Dec 2, 2024

How to use order by in Query %DLIST ?

Hello My Friends,

I have a question how to use order by %DLIST, this is my code:

SELECT

$ListToString(%DLIST(DISTINCT MRDIA_ICDCode_DR->MRCID_Code),', ' ) ICDX,

$ListToString(%DLIST(DISTINCT (MRDIA_ICDCode_DR->MRCID_Desc || ' (' || MRDIA_DiagnosisType_DR->DTYP_Code || ')')),', ' ) Diagnose

FROM SQLUser.PA_Adm

LEFT JOIN SQLUser.PA_AdmInsurance ON (PAADM_RowID = INS_ParRef AND INS_Rank = 1)

LEFT JOIN SQLUser.PA_AdmPackage ON (PAADM_RowID = PACK_ParRef)

LEFT JOIN SQLUser.MR_Adm on MRADM_ADM_DR = PAADM_RowID

LEFT JOIN SQLUser.MR_Diagnos ON MRADM_RowId = MRDIA_MRADM_ParRef

LEFT JOIN SQLUser.

#TrakCare #InterSystems IRIS #Other #.NET #Big Data #SQL

0 0

0 190

Announcement Anastasia Dyubaylo · Nov 22, 2024

[Video] Big Data, AI and ChatGPT in Healthcare

Hi Community,

Enjoy the new video on InterSystems Developers YouTube:

⏯ Big Data, AI and ChatGPT in Healthcare @ Global Summit 2024

#Summit #Artificial Intelligence (AI) #Big Data #Generative AI (GenAI) #Global Summit 2024 #Video

0 0

0 141

Announcement Jean Millette · Jun 12, 2024

Vote for Microsoft Change For Support of Source Management and CI/CD of Power BI Reports

TL;DR: My comment to Microsoft when I voted: Our team has implemented most of what we need for source management of Power BI Report files in Perforce. The missing piece? Automated creation of ".pbix" files from ".pbit" template files that can deployed to the Power BI Report Server. Let's get the manual “Power BI Desktop->File->Save As” step out of the process to make report deployment totally automated.

Our team has implemented change management for some critical Power BI Reports used by other teams at InterSystems. We do this by extracting the “.

#InterSystems IRIS #Analytics #Big Data #Change Management #Microsoft Windows #Perforce

2 2

0 278

Question deng hang · Mar 23, 2024

How to use Python to read journal logs

Hello everyone, I would like to not rely on %SYS.Journal.Record class is used to implement the function of reading and parsing logs. I want to use Python to read and parse journal log files, and implement %SYS.Journal.Record class functionality.

#InterSystems IRIS for Health #Big Data

0 1

0 330

Question Flávio Lúcio Naves Júnior · Mar 21, 2024

Do we have any class for work with blockchains?

Hello everyone,
I searched but didn't find anything. Could you tell me if the IRIS database has any class that natively works with blockchains?
Best Regards,
Flávio

#InterSystems IRIS #Big Data #Databases

0 3

0 260

Article Yuri Marx · Nov 20, 2023 3m read

Parquet files and InterSystems IRIS

In the world of Big Data, selecting the right file format is crucial for efficient data storage, processing, and analysis. With the massive amount of data generated every day, choosing the appropriate format can greatly impact the speed, cost, and accuracy of data processing tasks. There are several file formats available, each with its own set of advantages and disadvantages, making the decision of which one to use complex. Some of the popular Big Data file formats include CSV, JSON, Avro, ORC, and Parquet.

#HealthShare #InterSystems IRIS #InterSystems IRIS for Health #Big Data

Open Exchange

4 2

1 814

Discussion Yuri Marx · Oct 25, 2021

InterSystems IRIS Parquet file support

Hi community,

The parquet file format is the evolution of the CSV file format. See the differences when you process data in the AWS using CSV and Parquet:

Do you like to have support to parquet files in the InterSystems IRIS?

#InterSystems IRIS #Big Data #CSV

2 5

0 692

Article Yuri Marx · Nov 27, 2023 3m read

Read a parquet file to a JSON file and load in your IRIS repository

According to Databricks Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Apache Parquet is designed to be a common interchange format for both batch and interactive workloads. It is similar to other columnar-storage file formats available in Hadoop, namely RCFile and ORC. (source: https://www.databricks.com/glossary/what-is-parquet).

#HealthShare #InterSystems IRIS #InterSystems IRIS for Health #Big Data

Open Exchange

4 0

1 473

Discussion Dmitry Maslennikov · Sep 1, 2023

Data warehouses and data lakes with data from IRIS

In today's landscape, enterprises have grown substantially in scale, amassing vast amounts of data. This data is collected from a plethora of sources including different applications, databases, and other channels. Given the diversity and volume of this data, it's only logical for these enterprises to seek a deeper understanding of what their data entails. Some of the data can be stored in IRIS, and it can be reasonable to be able to add this data to a data lake too.

The Internet now offers many different tools for such tasks, that do not yet support IRIS, but it's achievable.

#InterSystems Ideas Portal #InterSystems IRIS #Big Data #SQL

6 7

0 703

Article Piyush Adhikari · Oct 19, 2022 3m read

Ingestion and Querying Speed Test

The capacity of taking numerous records every second while also facilitating real-time queries simultaneously in real time is called Hybrid Transactional Analytical Processing (HTAP). It is also called Transactional analytics or Transanalytics or Translytics and is a very useful element in scenarios where there is constant flow of real time data coming from IIOT sensors or data on fluctuations in stock market, and supporting the need for querying these data sets in real-time or near real-time.

#InterSystems IRIS #Analytics #Big Data #Database Transaction Processing #Data Import and Export

4 4

0 975

Article Lucas Enard · Aug 24, 2022 7m read

Web Scraping in IRIS using only Python

This GitHub is the simplest way to scrap using IRIS and Python, all of that already incorporated in an IRIS PRODUCTION.
From here you can build any IRIS production in full Python or in ObjectScript as this module is interoperable.
See for more information

1. IRIS-WEB-SCRAPING

1. IRIS-WEB-SCRAPING
2. What is Web Scraping:
- 2.1. The popular libraries/tools used for web scraping are:
- 2.2. The BS4 tool
3.
- 3.1. The Production
- 3.2. Step 1 : Find the URL of the webpage that you want to scrape.

#InterSystems IRIS #VSCode #Big Data #Databases #Python #Tools #Visualization

Open Exchange

0 0

1 488

Announcement Anastasia Dyubaylo · Jun 6, 2022

[Video] InterSystems HealthShare Analytics Solution: Create & Deliver Real-Time Insight at Scale

Hey Developers,

New video is already on InterSystems Developers YouTube channel:

⏯ InterSystems HealthShare Analytics Solution: Create & Deliver Real-Time Insight at Scale

#HealthShare #Analytics #Big Data #Video #Virtual Summit 2021

0 0

0 416

Announcement Anastasia Dyubaylo · May 15, 2022

[Video] Introducing the Smart Data Fabric

Hey Community,

This session provides more detail about the Smart Data Fabric announcement at #VSummit'21:

⏯ Introducing the Smart Data Fabric

#InterSystems IRIS #Big Data #Video #Virtual Summit 2021

0 0

0 397

Announcement Anastasia Dyubaylo · Apr 6, 2022

InterSystems at "J On The Beach" International Conference in Málaga, Spain

Hey Developers,

Good news! One more upcoming in-person event is nearby.

We're pleased to invite you to join "J On The Beach", an international rendezvous for developers and DevOps around Big Data technologies.A fun conference to learn and share the latest experiences, tips & tricks related to Big Data technologies, and, the most important part, it’s On The Beach!

🗓 April 27-29, 2022

📍Málaga, Spain

This year, InterSystems is a Gold Sponsor of the JOTB.

We're more than happy to invite you and your colleagues to our InterSystems booth for a personal conversation. As always, there will be some surprises on it... 😁

#Developer Community Official #Big Data #Events #Machine Learning (ML)

2 0

0 467

Announcement Evgeny Shvarov · Jan 12, 2022

InterSystems Datasets Contest Bonuses Results

Hi contestants!

We've introduced a set of bonuses for the projects for the Datasets Contest!

Here are the projects that scored it:

#InterSystems IRIS #Big Data #Contest #Databases #Data Import and Export

1 9

1 474

Article Henrique Dias · Jan 15, 2022 2m read

Empowering the user to find the perfect dataset

Hi everyone,

I want to talk about our project and use the dataset theme for this contest.

Our intention never was to be a data curator, especially because sometimes my precious data means a lot for me, but not for the rest of the world.

My Precious

We want to go a step further and empower the user to find the perfect dataset for their needs.

Our project is a bridge between the data science community and the developer's community using InterSystems IRIS to achieve this mission.

#InterSystems IRIS #InterSystems IRIS for Health #Big Data #Contest #Data Model

Open Exchange

4 0

0 596

Article Yuri Marx · Oct 19, 2021 5m read

Big Data components and the InterSystems IRIS

In the last years the data architecture and platforms focused into Big Data repositories and how toprocess it to deliver business value. From this effort many technologies were created to process tera and petabytes of data, see:

The fundamental piece to the Big Data technologies is HDFS (Hadoop Distributed File System). It is a distributed file system to store tera or petabytes of data into arrays of storages, memory and CPU working together.

#InterSystems IRIS #InterSystems IRIS for Health #Big Data

Open Exchange

2 0

1 858

Article Yuri Marx · Oct 19, 2021 2m read

Using SQL (Apache Hive) into Hadoop Big Data Repositories

Hi Community,

The InterSystems IRIS has a good connector to do Hadoop using Spark. But the market offers other excellent alternative to Big Data Hadoop access, the Apache Hive. See the differences:

Hive vs. Spark
Source: https://dzone.com/articles/comparing-apache-hive-vs-spark

I created a PEX interoperability service to allows you use Apache Hive inside your InterSystems IRIS apps. To try it follow these steps:

1. Do a git clone to the iris-hive-adapter project:

$ git clone https://github.com/yurimarx/iris-hive-adapter.git

2. Open the terminal in this directory and run:

$ docker-compose build

#InterSystems IRIS #InterSystems IRIS for Health #Big Data #Interoperability

Open Exchange

3 0

1 644

Article Phillip Booth · Jan 30, 2020 3m read

Using Synthea and Docker for Consistent, Realistic Synthetic Patient Generation.

Over the last couple of weeks the Solution Architecture team has been working to finish off our 2019 workload: this included open-sourcing the Readmission Demo that was brought to HIMSS last year, so we could make it available to anyone looking for an interactive-way of exploring the tooling provided by IRIS.

While in the process of open sourcing the demo we were immediately hit hard with a showstopper.

#InterSystems IRIS for Health #Analytics #Best Practices #Big Data #Containerization #Docker #Tools

Open Exchange

12 2

2 1520

Article Anton Umnikov · Oct 17, 2019 5m read

Using AWS Glue with InterSystems IRIS

October 17, 2019

Anton Umnikov
Sr. Cloud Solutions Architect at InterSystems
AWS CSAA, GCP CACE

AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.

In the case of InterSystems IRIS, AWS Glue allows moving large amounts of data from both Cloud and on-Prem data sources into IRIS.

#InterSystems IRIS #AWS #Best Practices #Big Data #Cloud #Databases #Python #SQL

6 1

3 2281

Announcement Anastasia Dyubaylo · Dec 28, 2020

Online Meetup with the InterSystems Analytics Contest Winners

Hi Community,

We're pleased to invite you to the online meetup with the winners of the InterSystems Analytics Contest!

Date & Time: Monday, January 4, 2021 – 10:00 EDT

What awaits you at this virtual Meetup?

Our winners' bios.
Short demos on their applications.
An open discussion about technologies being used, bonuses, questions. Plans for the next contests.

#InterSystems IRIS #IRIS contest #Open Exchange #Artificial Intelligence (AI) #Analytics #Big Data #Events #Machine Learning (ML) #Meetup

6 3

0 463

Article Yuri Marx · Jan 4, 2021 2m read

Big Data 5V with InterSystems IRIS

Big Data 5V with InterSystems IRIS

See the table below:

Velocity: Elastic velocity delivered with horizontal and vertical node scaling
Enablers: Distributed memory cache, Distributed processing, Sharding and Multimodel Architecture
https://www.intersystems.com/isc-resources/wp-content/uploads/sites/24/… and https://learning.intersystems.com/course/view.php?id=1254&ssoPass=1

Value: exponential data value produced by Analytics and IA
Enablers: BI, NLP, ML, AutoML and Multimodel Architecture
https://docs.intersystems.com/irislatest/csp/docbook/Doc.View.cls?

#InterSystems IRIS #Archive #Big Data #InterSystems Business Solutions and Architectures

2 0

1 618

Announcement Anastasia Dyubaylo · Dec 28, 2020

Winners of the InterSystems Analytics Contest!

Hey Developers,

The InterSystems Analytics Contest is over. Thank you all for participating in our exciting coding marathon!

And now it's time to announce the winners!

A storm of applause goes to these developers and their applications:

#InterSystems IRIS #IRIS contest #Open Exchange #Artificial Intelligence (AI) #Analytics #Big Data #Contest #Machine Learning (ML)

6 3

0 384

Dev Community resources

InterSystems resources

#Big Data

Dealing with Very Large Data in IRIS 2026.1

What’s New in InterSystems IRIS and IRIS for Health 2026.1

Data Streaming with InterSystems IRIS Interoperability

Overview

High-Performance Message Searching in Health Connect

High-Performance Message Searching in Health Connect

The Problem

Schema Design Best Practices for Cross-Departmental Data Lakes in IRIS

Are you using Jupyter Notebooks with IRIS?

Performant SQL For Paging Results (DataTables, Select2, etc.)

Databricks Station - InterSystems Cloud SQL

A Quick Start to InterSystems Cloud SQL Data in Databricks

How to use order by in Query %DLIST ?

[Video] Big Data, AI and ChatGPT in Healthcare

Vote for Microsoft Change For Support of Source Management and CI/CD of Power BI Reports

How to use Python to read journal logs

Do we have any class for work with blockchains?

Parquet files and InterSystems IRIS

InterSystems IRIS Parquet file support

Read a parquet file to a JSON file and load in your IRIS repository

Data warehouses and data lakes with data from IRIS

Ingestion and Querying Speed Test

Web Scraping in IRIS using only Python

1. IRIS-WEB-SCRAPING

[Video] InterSystems HealthShare Analytics Solution: Create & Deliver Real-Time Insight at Scale

[Video] Introducing the Smart Data Fabric

InterSystems at "J On The Beach" International Conference in Málaga, Spain

InterSystems Datasets Contest Bonuses Results

Empowering the user to find the perfect dataset

Big Data components and the InterSystems IRIS

Using SQL (Apache Hive) into Hadoop Big Data Repositories

Using Synthea and Docker for Consistent, Realistic Synthetic Patient Generation.

Using AWS Glue with InterSystems IRIS

Online Meetup with the InterSystems Analytics Contest Winners

Big Data 5V with InterSystems IRIS

Winners of the InterSystems Analytics Contest!

Community in numbers

Dev Community resources

InterSystems resources

Our social networks

#Big Data

What’s New in InterSystems IRIS and IRIS for Health 2026.1

Overview

High-Performance Message Searching in Health Connect

The Problem

A Quick Start to InterSystems Cloud SQL Data in Databricks

1. IRIS-WEB-SCRAPING

Trending apps

Community in numbers