Last week, we announced the InterSystems IRIS Data Platform, our new and comprehensive platform for all your data endeavours, whether transactional, analytics or both. We've included many of the features our customers know and loved from Caché and Ensemble, but in this article we'll shed a little more light on one of the new capabilities of the platform: SQL Sharding, a powerful new feature in our scalability story.
In this article, we will run an InterSystems IRIS cluster using docker and Merge CPF files - a new feature allowing you to configure servers with ease.
On UNIX® and Linux, you can modify the default iris.cpf using a declarative CPF merge file. A merge file is a partial CPF that sets the desired values for any number of parameters upon instance startup. The CPF merge operation works only once for each instance.
Our cluster architecture is very simple, it would consist of one Node1 (master node) and two Data Nodes (check all available roles). Unfortunately, docker-compose cannot deploy to several servers (although it can deploy to remote hosts), so this is useful for local development of sharding-aware data models, tests, and such. For a productive InterSystems IRIS Cluster deployment, you should use either ICM or IKO.
InterSystems and Intel recently conducted a series of benchmarks combining InterSystems IRIS with 2nd Generation Intel® Xeon® Scalable Processors, also known as “Cascade Lake”, and Intel® Optane™ DC Persistent Memory (DCPMM). The goals of these benchmarks are to demonstrate the performance and scalability capabilities of InterSystems IRIS with Intel’s latest server technologies in various workload settings and server configurations. Along with various benchmark results, three different use-cases of Intel DCPMM with InterSystems IRIS are provided in this report.
With the release of InterSystems IRIS, we're also making available a nifty bit of software that allows you to get the best out of your InterSystems IRIS cluster when working with Apache Spark for data processing, machine learning and other data-heavy fun. Let's take a closer look at how we're making your life as a Data Scientist easier, as you're probably already facing tough big data challenges already, just from the influx of job offers in your inbox!
Prior to IRIS, using ECP to share databases under heavy I/O load has known latency issues (in our environment), which pretty much restricted using shared database with relatively static or slow-changing data. In IRIS, will sharding mitigate the latency issue and allow any table to be shared?
1. Lets assume we have a global (matrix) [X,Y,Z] that is distributed across sharded nodes 2. Matrix size doesnt matter, but lets assume it holds 500 GB for the moment 3. I want to return all rows where f(x,y,z) is true. f() is an arbitrary function, i.e. f = x + 20(y*y) > z