Big Data | InterSystems Developer Community

In today's landscape, enterprises have grown substantially in scale, amassing vast amounts of data. This data is collected from a plethora of sources including different applications, databases, and other channels. Given the diversity and volume of this data, it's only logical for these enterprises to seek a deeper understanding of what their data entails. Some of the data can be stored in IRIS, and it can be reasonable to be able to add this data to a data lake too.

The Internet now offers many different tools for such tasks, that do not yet support IRIS, but it's achievable.

#Big Data #SQL #InterSystems Ideas Portal #InterSystems IRIS

6 7

0 564

Question

Michael Davidovich · Apr 10

Performant SQL For Paging Results (DataTables, Select2, etc.)

Hello,

Our software commonly returns a full result set to the client and we use the DataTables plugin to display table data. This has worked well, but at datasets grow larger, we are trying to move some of these requests server-side so the server handles the bulk of the work rather than the client. This has had me scratching my head in so many ways.

I'm hoping I can get a mix of general best practice advice but also maybe some IRIS specific ideas.

Some background

#Big Data #Databases #Data Model #JavaScript #SQL #InterSystems IRIS #InterSystems IRIS for Health

0 6

0 103

Discussion

Yuri Marx · Oct 25, 2021

InterSystems IRIS Parquet file support

Hi community,

The parquet file format is the evolution of the CSV file format. See the differences when you process data in the AWS using CSV and Parquet:

Do you like to have support to parquet files in the InterSystems IRIS?

#Big Data #CSV #InterSystems IRIS

2 5

0 583

Article

Mark Bolinsky · Mar 3, 2020 11m read

InterSystems IRIS and Intel Optane DC Persistent Memory

InterSystems and Intel recently conducted a series of benchmarks combining InterSystems IRIS with 2nd Generation Intel® Xeon® Scalable Processors, also known as “Cascade Lake”, and Intel® Optane™ DC Persistent Memory (DCPMM). The goals of these benchmarks are to demonstrate the performance and scalability capabilities of InterSystems IRIS with Intel’s latest server technologies in various workload settings and server configurations. Along with various benchmark results, three different use-cases of Intel DCPMM with InterSystems IRIS are provided in this report.

#Big Data #HL7 #Interoperability #InterSystems Business Solutions and Architectures #Sharding #Testing #HealthShare #InterSystems IRIS #InterSystems IRIS for Health #TrakCare

5 5

0 1K

Article

David E Nelson · Mar 9, 2017 9m read

Machine Learning with Spark and Caché

Apache Spark has rapidly become one of the most exciting technologies for big data analytics and machine learning. Spark is a general data processing engine created for use in clustered computing environments. Its heart is the Resilient Distributed Dataset (RDD) which represents a distributed, fault tolerant, collection of data that can be operated on in parallel across the nodes of a cluster. Spark is implemented using a combination of Java and Scala and so comes as a library that can run on any JVM.

#Artificial Intelligence (AI) #Analytics #Big Data #JDBC #Machine Learning (ML) #Python #Vector Search #Caché

11 5

1 2.8K

Article

Alexey Maslov · Oct 20, 2020 11m read

Parallel Processing of Multi-Model Data in InterSystems IRIS and Caché

As we all well know, InterSystems IRIS has an extensive range of tools for improving the scalability of application systems. In particular, much has been done to facilitate the parallel processing of data, including the use of parallelism in SQL query processing and the most attention-grabbing feature of IRIS: sharding. However, many mature developments that started back in Caché and have been carried over into IRIS actively use the multi-model features of this DBMS, which are understood as allowing the coexistence of different data models within a single database. For example, the HIS qMS database contains both semantic relational (electronic medical records) as well as traditional relational (interaction with PACS) and hierarchical data models (laboratory data and integration with other systems). Most of the listed models are implemented using SP.ARM's qWORD tool (a mini-DBMS that is based on direct access to globals). Therefore, unfortunately, it is not possible to use the new capabilities of parallel query processing for scaling, since these queries do not use IRIS SQL access.

Meanwhile, as the size of the database grows, most of the problems inherent to large relational databases become right for non-relational ones. So, this is a major reason why we are interested in parallel data processing as one of the tools that can be used for scaling.

In this article, I would like to discuss those aspects of parallel data processing that I have been dealing with over the years when solving tasks that are rarely mentioned in discussions of Big Data. I am going to be focusing on the technological transformation of databases, or, rather, technologies for transforming databases.

#Big Data #DevOps #Caché #InterSystems IRIS

12 4

3 875

Article

Piyush Adhikari · Oct 19, 2022 3m read

Ingestion and Querying Speed Test

The capacity of taking numerous records every second while also facilitating real-time queries simultaneously in real time is called Hybrid Transactional Analytical Processing (HTAP). It is also called Transactional analytics or Transanalytics or Translytics and is a very useful element in scenarios where there is constant flow of real time data coming from IIOT sensors or data on fluctuations in stock market, and supporting the need for querying these data sets in real-time or near real-time.

#Analytics #Big Data #Database Transaction Processing #Data Import and Export #InterSystems IRIS

4 4

0 848

Article

Fabian Haupt · Jan 20, 2017 8m read

Visualizing the data jungle -- Part I. Let's make a graph

This is the first article of a series diving into visualization tools and analysis of time series data. Obviously we are most interested in looking at performance related data we can gather from the Caché family of products. However, as we'll see down the road, we are absolutely not limited to that. For now we are exploring python and the libraries/tools available within that ecosystem.

#Big Data #Object Data Model #Python #Tools #Visualization #Caché

9 4

1 1.7K

Announcement

Anastasia Dyubaylo · Dec 5, 2020

InterSystems Analytics Contest Kick-off Webinar

Hi Community!

We are pleased to invite all the developers to the upcoming InterSystems Analytics Contest Kick-off Webinar! The topic of this webinar is dedicated to the Analytics contest.

On this webinar, we’ll demo the iris-analytics-template and answer the questions on how to develop, build, and deploy Analytics applications using InterSystems IRIS.

Date & Time: Monday, December 7 — 12:00 PM EDT

Speakers:
🗣 @Carmen Logue, InterSystems Product Manager - Analytics and AI
🗣 @Evgeny Shvarov, InterSystems Developer Ecosystem Manager

#Artificial Intelligence (AI) #Analytics #Big Data #Contest #Events #Machine Learning (ML) #Webinar #InterSystems IRIS #IRIS contest #Open Exchange

5 4

0 264

Article

sween · Nov 7, 2019 5m read

Export InterSystems IRIS Data to BigQuery on Google Cloud Platform

Loading your IRIS Data to your Google Cloud Big Query Data Warehouse and keeping it current can be a hassle with bulky Commercial Third Party Off The Shelf ETL platforms, but made dead simple using the iris2bq utility.

Let's say IRIS is contributing to workload for a Hospital system, routing DICOM images, ingesting HL7 messages, posting FHIR resources, or pushing CCDA's to next provider in a transition of care. Natively, IRIS persists these objects in various stages of the pipeline via the nature of the business processes and anything you included along the way. Lets send that up to Google Big Query to augment and compliment the rest of our Data Warehouse data and ETL (Extract Transform Load) or ELT (Extract Load Transform) to our hearts desire.

A reference architecture diagram may be worth a thousand words, but 3 bullet points may work out a little bit better:

It exports the data from IRIS into DataFrames
It saves them into GCS as .avro to keep the schema along the data: this will avoid to specify/create the BigQuery table schema beforehands.
It starts BigQuery jobs to import those .avro into the respective BigQuery tables you specify.

#Best Practices #Big Data #Cloud #Google Cloud Platform (GCP) #integration-required #InterSystems IRIS #InterSystems IRIS for Health

Open Exchange app

5 3

0 1.2K

Announcement

Anastasia Dyubaylo · Dec 21, 2020

Vote for the best app in the InterSystems Analytics Contest!

Hey Developers,

This week is a voting week for the InterSystems Analytics Contest! So, it's time to give your vote to the best solutions built with InterSystems IRIS.

🔥 You decide: VOTING IS HERE 🔥

How to vote?

Please meet the new voting engine and algorithm for the Experts and Community nomination:

#Artificial Intelligence (AI) #Analytics #Big Data #Contest #Machine Learning (ML) #InterSystems IRIS #IRIS contest #Open Exchange

8 3

0 311

Announcement

Anastasia Dyubaylo · Dec 28, 2020

Winners of the InterSystems Analytics Contest!

Hey Developers,

The InterSystems Analytics Contest is over. Thank you all for participating in our exciting coding marathon!

And now it's time to announce the winners!

A storm of applause goes to these developers and their applications:

#Artificial Intelligence (AI) #Analytics #Big Data #Contest #Machine Learning (ML) #InterSystems IRIS #IRIS contest #Open Exchange

6 3

0 345

Question

Flávio Lúcio Na... · Mar 21, 2024

Do we have any class for work with blockchains?

Hello everyone,

I searched but didn't find anything. Could you tell me if the IRIS database has any class that natively works with blockchains?

Best Regards,
Flávio

#Big Data #Databases #InterSystems IRIS

0 3

0 169

Announcement

Anastasia Dyubaylo · Dec 28, 2020

Online Meetup with the InterSystems Analytics Contest Winners

Hi Community,

We're pleased to invite you to the online meetup with the winners of the InterSystems Analytics Contest!

Date & Time: Monday, January 4, 2021 – 10:00 EDT

What awaits you at this virtual Meetup?

Our winners' bios.
Short demos on their applications.
An open discussion about technologies being used, bonuses, questions. Plans for the next contests.

#Artificial Intelligence (AI) #Analytics #Big Data #Events #Machine Learning (ML) #Meetup #InterSystems IRIS #IRIS contest #Open Exchange

6 3

0 417

Question

Paul Riker · Mar 29, 2019

Ensemble as a Data lake

We have been storing raw messages in a MySQL database for DR and ad hoc purposes. We are thinking of using an Ensemble instance as our data lake instead. We could segregate the source data by namespace or by global. But either way we'll want a custom global to index the data for data retrieval performance purposes.

Anyone else taking this approach? Any feedback?

#Big Data #Databases #Indexing #Ensemble

0 2

0 515

Article

sween · Jul 23, 2024 4m read

Databricks Station - InterSystems Cloud SQL

A Quick Start to InterSystems Cloud SQL Data in Databricks

#Big Data #Cloud #Java #JDBC #Python #SQL #SSL #InterSystems IRIS

5 2

1 395

Question

Yunier Gonzalez · Oct 31, 2019

Working with Data: System in production

Greetings community. I would like to know how to migrate a BD in production to a local environment. When I have a system in production (BD Sql Server) what we do is mount a local copy to do the analysis with the data and not occupy resources of the system in production. My question is: How do you do it with Intersystems technology?

#Backup #Big Data #Databases #SQL #Caché #InterSystems IRIS #InterSystems IRIS for Health

0 2

0 328

Article

Phillip Booth · Jan 30, 2020 3m read

Using Synthea and Docker for Consistent, Realistic Synthetic Patient Generation.

Over the last couple of weeks the Solution Architecture team has been working to finish off our 2019 workload: this included open-sourcing the Readmission Demo that was brought to HIMSS last year, so we could make it available to anyone looking for an interactive-way of exploring the tooling provided by IRIS.

#Analytics #Best Practices #Big Data #Containerization #Docker #Tools #InterSystems IRIS for Health

Open Exchange app

11 2

1 1.4K

Article

Yuri Marx · Nov 20, 2023 3m read

Parquet files and InterSystems IRIS

In the world of Big Data, selecting the right file format is crucial for efficient data storage, processing, and analysis. With the massive amount of data generated every day, choosing the appropriate format can greatly impact the speed, cost, and accuracy of data processing tasks. There are several file formats available, each with its own set of advantages and disadvantages, making the decision of which one to use complex. Some of the popular Big Data file formats include CSV, JSON, Avro, ORC, and Parquet.

#Big Data #HealthShare #InterSystems IRIS #InterSystems IRIS for Health

Open Exchange app

4 2

1 584

Article

Benjamin De Boe · Jan 31, 2018 4m read

Introducing the InterSystems IRIS Connector for Apache Spark

With the release of InterSystems IRIS, we're also making available a nifty bit of software that allows you to get the best out of your InterSystems IRIS cluster when working with Apache Spark for data processing, machine learning and other data-heavy fun. Let's take a closer look at how we're making your life as a Data Scientist easier, as you're probably already facing tough big data challenges already, just from the influx of job offers in your inbox!

#Artificial Intelligence (AI) #Analytics #Big Data #Distributed Data Management #Java #Machine Learning (ML) #Sharding #InterSystems IRIS

2 2

0 1.7K

Article

Niyaz Khafizov · Jul 27, 2018 4m read

Load a ML model into InterSystems IRIS

Hi all. Today we are going to upload a ML model into IRIS Manager and test it.

Note: I have done the following on Ubuntu 18.04, Apache Zeppelin 0.8.0, Python 3.6.5.

Introduction

These days many available different tools for Data Mining enable you to develop predictive models and analyze the data you have with unprecedented ease. InterSystems IRIS Data Platform provide a stable foundation for your big data and fast data applications, providing interoperability with modern DataMining tools.

#Artificial Intelligence (AI) #Analytics #API #Beginner #Best Practices #Big Data #Machine Learning (ML) #Python #InterSystems IRIS

6 2

2 1.4K

Announcement

Jean Millette · Jun 12, 2024

Vote for Microsoft Change For Support of Source Management and CI/CD of Power BI Reports

TL;DR: My comment to Microsoft when I voted: Our team has implemented most of what we need for source management of Power BI Report files in Perforce. The missing piece?

#Analytics #Big Data #Change Management #Microsoft Windows #Perforce #InterSystems IRIS

2 2

0 176

Article

Anton Umnikov · Oct 17, 2019 5m read

Using AWS Glue with InterSystems IRIS

October 17, 2019

Anton Umnikov
Sr. Cloud Solutions Architect at InterSystems
AWS CSAA, GCP CACE

AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores.

#AWS #Best Practices #Big Data #Cloud #Databases #Python #SQL #InterSystems IRIS

6 1

3 2.1K

Discussion

Raj Singh · Apr 10

Are you using Jupyter Notebooks with IRIS?

Are you using Jupyter Notebooks with IRIS? Are you using the vscode-iris-jupyter-server VS Code extension for your notebooking? If so, please let me know either via direct message or with a comment on this post. I'd like to hear more about how our customers are working with tool specifically, and with data science more generally.

Thanks!

#Big Data #Embedded Python #Python #VSCode #InterSystems IRIS

1 1

0 77

Article

Niyaz Khafizov · Jul 6, 2018 3m read

The way to launch Apache Spark + Apache Zeppelin + InterSystems IRIS

Hi all. Yesterday I tried to connect Apache Spark, Apache Zeppelin, and InterSystems IRIS. During the process, I experienced troubles connecting it all together and I did not find a useful guide. So, I decided to write my own.

#Artificial Intelligence (AI) #Beginner #Best Practices #Big Data #Machine Learning (ML) #InterSystems IRIS

10 1

1 1.9K

Question

deng hang · Mar 23, 2024

How to use Python to read journal logs

Hello everyone, I would like to not rely on %SYS.Journal.Record class is used to implement the function of reading and parsing logs. I want to use Python to read and parse journal log files, and implement %SYS.Journal.Record class functionality.

#Big Data #InterSystems IRIS for Health

0 1

0 227

Article

Yuri Marx · Oct 19, 2021 2m read

Using SQL (Apache Hive) into Hadoop Big Data Repositories

Hi Community,

The InterSystems IRIS has a good connector to do Hadoop using Spark. But the market offers other excellent alternative to Big Data Hadoop access, the Apache Hive. See the differences:

#Big Data #Interoperability #InterSystems IRIS #InterSystems IRIS for Health

Open Exchange app

3 0

1 563

Article

Yuri Marx · Oct 19, 2021 5m read

Big Data components and the InterSystems IRIS

In the last years the data architecture and platforms focused into Big Data repositories and how toprocess it to deliver business value. From this effort many technologies were created to process tera and petabytes of data, see:

#Big Data #InterSystems IRIS #InterSystems IRIS for Health

Open Exchange app

2 0

1 765