According to Databricks Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Apache Parquet is designed to be a common interchange format for both batch and interactive workloads. It is similar to other columnar-storage file formats available in Hadoop, namely RCFile and ORC. (source: https://www.databricks.com/glossary/what-is-parquet).

1 0
1 50

In the world of Big Data, selecting the right file format is crucial for efficient data storage, processing, and analysis. With the massive amount of data generated every day, choosing the appropriate format can greatly impact the speed, cost, and accuracy of data processing tasks. There are several file formats available, each with its own set of advantages and disadvantages, making the decision of which one to use complex. Some of the popular Big Data file formats include CSV, JSON, Avro, ORC, and Parquet.

2 0
1 108

In today's landscape, enterprises have grown substantially in scale, amassing vast amounts of data. This data is collected from a plethora of sources including different applications, databases, and other channels. Given the diversity and volume of this data, it's only logical for these enterprises to seek a deeper understanding of what their data entails. Some of the data can be stored in IRIS, and it can be reasonable to be able to add this data to a data lake too.

The Internet now offers many different tools for such tasks, that do not yet support IRIS, but it's achievable.

5 7
0 298
Article
· Oct 19, 2022 3m read
Ingestion and Querying Speed Test

The capacity of taking numerous records every second while also facilitating real-time queries simultaneously in real time is called Hybrid Transactional Analytical Processing (HTAP). It is also called Transactional analytics or Transanalytics or Translytics and is a very useful element in scenarios where there is constant flow of real time data coming from IIOT sensors or data on fluctuations in stock market, and supporting the need for querying these data sets in real-time or near real-time.

4 4
0 701

Hey Developers,

New video is already on InterSystems Developers YouTube channel:

InterSystems HealthShare Analytics Solution: Create & Deliver Real-Time Insight at Scale

https://www.youtube.com/embed/PbSKedG25eA
[This is an embedded link, but you cannot view embedded content directly on the site because you have declined the cookies necessary to access it. To view embedded content, you would need to accept all cookies in your Cookies Settings]

0 0
0 226

Hey Community,

This session provides more detail about the Smart Data Fabric announcement at #VSummit'21:

Introducing the Smart Data Fabric

https://www.youtube.com/embed/00ilwHJY6B4
[This is an embedded link, but you cannot view embedded content directly on the site because you have declined the cookies necessary to access it. To view embedded content, you would need to accept all cookies in your Cookies Settings]

0 0
0 214

Hey Developers,

Good news! One more upcoming in-person event is nearby.

We're pleased to invite you to join "J On The Beach", an international rendezvous for developers and DevOps around Big Data technologies. A fun conference to learn and share the latest experiences, tips & tricks related to Big Data technologies, and, the most important part, it’s On The Beach!

🗓 April 27-29, 2022

📍Málaga, Spain

This year, InterSystems is a Gold Sponsor of the JOTB.

We're more than happy to invite you and your colleagues to our InterSystems booth for a personal conversation. As always, there will be some surprises on it... 😁

2 0
0 351

Hi everyone,

I want to talk about our project and use the dataset theme for this contest.

Our intention never was to be a data curator, especially because sometimes my precious data means a lot for me, but not for the rest of the world.

My Precious

We want to go a step further and empower the user to find the perfect dataset for their needs.

Our project is a bridge between the data science community and the developer's community using InterSystems IRIS to achieve this mission.

4 0
0 308

Over the last couple of weeks the Solution Architecture team has been working to finish off our 2019 workload: this included open-sourcing the Readmission Demo that was brought to HIMSS last year, so we could make it available to anyone looking for an interactive-way of exploring the tooling provided by IRIS.

11 2
1 1K

Hi Community,

We're pleased to invite you to the online meetup with the winners of the InterSystems Analytics Contest!

Date & Time: Monday, January 4, 2021 – 10:00 EDT

What awaits you at this virtual Meetup?

  • Our winners' bios.
  • Short demos on their applications.
  • An open discussion about technologies being used, bonuses, questions. Plans for the next contests.

6 3
0 315

Hey Developers,

This week is a voting week for the InterSystems Analytics Contest! So, it's time to give your vote to the best solutions built with InterSystems IRIS.

🔥 You decide: VOTING IS HERE 🔥

How to vote?

Please meet the new voting engine and algorithm for the Experts and Community nomination:

8 3
0 252

Hi Community!

We are pleased to invite all the developers to the upcoming InterSystems Analytics Contest Kick-off Webinar! The topic of this webinar is dedicated to the Analytics contest.

On this webinar, we’ll demo the iris-analytics-template and answer the questions on how to develop, build, and deploy Analytics applications using InterSystems IRIS.

Date & Time: Monday, December 7 — 12:00 PM EDT

Speakers:
🗣 @Carmen Logue, InterSystems Product Manager - Analytics and AI
🗣 @Evgeny Shvarov, InterSystems Developer Ecosystem Manager

5 4
0 212

As we all well know, InterSystems IRIS has an extensive range of tools for improving the scalability of application systems. In particular, much has been done to facilitate the parallel processing of data, including the use of parallelism in SQL query processing and the most attention-grabbing feature of IRIS: sharding. However, many mature developments that started back in Caché and have been carried over into IRIS actively use the multi-model features of this DBMS, which are understood as allowing the coexistence of different data models within a single database. For example, the HIS qMS database contains both semantic relational (electronic medical records) as well as traditional relational (interaction with PACS) and hierarchical data models (laboratory data and integration with other systems). Most of the listed models are implemented using SP.ARM's qWORD tool (a mini-DBMS that is based on direct access to globals). Therefore, unfortunately, it is not possible to use the new capabilities of parallel query processing for scaling, since these queries do not use IRIS SQL access.

Meanwhile, as the size of the database grows, most of the problems inherent to large relational databases become right for non-relational ones. So, this is a major reason why we are interested in parallel data processing as one of the tools that can be used for scaling.

In this article, I would like to discuss those aspects of parallel data processing that I have been dealing with over the years when solving tasks that are rarely mentioned in discussions of Big Data. I am going to be focusing on the technological transformation of databases, or, rather, technologies for transforming databases.

12 4
3 651

Hi everyone.
We are a team of company "Constructor" and we develop cutting edge cartographic systems. Recently the amount of image data skyrocketed so we want to give our users the ability to tie images to places automatically. For that, we want to use AI/ML technologies and we have a cool task for you.

https://cloud.mail.ru/public/pHbC/4r7Z58m6f/

1 0
0 240

Hi Community,

The new video from Global Summit 2019 is already on InterSystems Developers YouTube:

Automated InterSystems IRIS Cloud Scaling

https://www.youtube.com/embed/4O2Nr5f_0vo
[This is an embedded link, but you cannot view embedded content directly on the site because you have declined the cookies necessary to access it. To view embedded content, you would need to accept all cookies in your Cookies Settings]

0 0
0 268

InterSystems and Intel recently conducted a series of benchmarks combining InterSystems IRIS with 2nd Generation Intel® Xeon® Scalable Processors, also known as “Cascade Lake”, and Intel® Optane™ DC Persistent Memory (DCPMM). The goals of these benchmarks are to demonstrate the performance and scalability capabilities of InterSystems IRIS with Intel’s latest server technologies in various workload settings and server configurations. Along with various benchmark results, three different use-cases of Intel DCPMM with InterSystems IRIS are provided in this report.

5 5
0 843
Speaker: Mike Gualtieri, Research Vice President & Principal Analyst, Forrester
Learn why analyst firm Forrester calls translytical data platforms the ideal database technology. Attendees will receive free copies of the "Forrester Wave™: Translytical Data Platforms, Q4 2019."
0 0
0 198
Article
· Jul 27, 2018 4m read
Load a ML model into InterSystems IRIS

Hi all. Today we are going to upload a ML model into IRIS Manager and test it.

Note: I have done the following on Ubuntu 18.04, Apache Zeppelin 0.8.0, Python 3.6.5.

Introduction

These days many available different tools for Data Mining enable you to develop predictive models and analyze the data you have with unprecedented ease. InterSystems IRIS Data Platform provide a stable foundation for your big data and fast data applications, providing interoperability with modern DataMining tools.

6 2
2 1.2K