#Big Data

1 Follower · 46 Posts

Big data is a field that treats ways to analyze, systematically extract information from. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. 

Learn more.

Article Benjamin De Boe · Apr 2 2m read

What’s New in InterSystems IRIS and IRIS for Health 2026.1

InterSystems IRIS 2026.1 is here, and it’s packed with powerful enhancements designed to help organizations scale their data management like never before. Whether you’re dealing with the operational aspects of managing massive datasets or looking to optimize storage costs, this release brings a host of features to simplify life with your data and meet the growing challenges of very large datasets.

0
0 106
Article Yuri Marx · Nov 18, 2025 11m read

Modern data architectures utilize real-time data capture, transformation, movement, and loading solutions to build data lakes, analytical warehouses, and big data repositories. It enables the analysis of data from various sources without impacting the operations that use them. To achieve this, establishing a continuous, scalable, elastic, and robust data flow is essential. The most prevalent method for that is through the CDC (Change Data Capture)  technique. CDC monitors for small data set production, automatically captures this data, and delivers it to one or more recipients, including analytical data repositories. The major benefit is the elimination of the D+1 delay in analysis, as data is detected at the source as soon as it is produced, and later is replicated to the destination.

This article will demonstrate the two most common data sources for CDC scenarios, both as a source and a destination. For the data source (origin), we will explore the CDC in SQL databases and CSV files. For the data destination, we will use a columnar database (a typical high-performance analytical database scenario) and a Kafka topic (a standard approach for streaming data to the cloud and/or to multiple real-time data consumers).

Overview

This article will provide a sample for the following interoperability scenario:

0
4 259
Question Kanishk Mittal · Jul 28, 2025

We’re building out a data lake in IRIS 2025.1 that aggregates data across multiple business systems and departments. I’m trying to establish best practices for schema design and separation.

Right now, I’m thinking of using a separate schema for each distinct system of record feeding into the data lake - for example, one schema per upstream source system, rather than splitting based on function (e.g. staging, raw, curated). The idea is that this would make it easier to manage source ownership, auditing, and pipeline logic, especially when multiple domains are contributing data.

0
0 119