Distributed Data Management | InterSystems Developer Community

Article

Benjamin De Boe · Sep 19, 2017 4m read

Horizontal Scalability with InterSystems IRIS

Last week, we announced the InterSystems IRIS Data Platform, our new and comprehensive platform for all your data endeavours, whether transactional, analytics or both. We've included many of the features our customers know and loved from Caché and Ensemble, but in this article we'll shed a little more light on one of the new capabilities of the platform: SQL Sharding, a powerful new feature in our scalability story.

#Artificial Intelligence (AI) #Analytics #Distributed Data Management #ECP #Machine Learning (ML) #Sharding #SQL #InterSystems IRIS

14 11

2 1.7K

Article

Timur Safin · Aug 19, 2016 10m read

Caché MapReduce - introduction to BigData and MapReduce concept

Several years ago everyone got mad about BigData – nobody knew when smallish data will become BIGDATA, but all knows that it’s trendy and the way to go.

#Artificial Intelligence (AI) #C++ #Data Model #Distributed Data Management #Machine Learning (ML) #Caché

8 3

1 1.9K

Article

Timur Safin · Sep 2, 2016 11m read

Caché MapReduce - putting it all together – WordCount example (part III)

In part I of this series we have introduced MapReduce as a generic concept, and in part II we started to approach Caché ObjectScript implementation via introducing abstract interfaces. Now we will try to provide more concrete examples of applications using MapReduce.

#Data Model #Distributed Data Management #Object Data Model #Caché

5 3

1 1.2K

Article

Alexey Maslov · Nov 17, 2016 11m read

ECP and Process Management API

The technology of load balancing between several servers with relatively low capacity has been a standard feature of Caché for quite a while. It is based on the distributed cache technology called ECP (Enterprise Cache Protocol). ECP provides a host of possibilities for horizontal scaling of an application, and yet keeping the project budget fairly low. Another apparent advantage of ECP network is the possibility to conceal its architecture in the depths of Caché configuration so that applications developed for the traditional (vertical) architecture can be fairly easily migrated to a horizontal ECP environment. The ease of this process is so mesmerizing, that you start wishing it was always this way. For instance, everybody is used to having a possibility to control Caché processes: the $Job system variable and associated classes/functions work magic in skilful hands. Stop, but now processes can end up being on different Caché servers…

This article is about how to gain as much transparency in controlling processes in ECP environment as in traditional (non ECP) one.

#Caché #Distributed Data Management #ECP

4 6

0 2K

Article

Benjamin De Boe · Jan 31, 2018 4m read

Introducing the InterSystems IRIS Connector for Apache Spark

With the release of InterSystems IRIS, we're also making available a nifty bit of software that allows you to get the best out of your InterSystems IRIS cluster when working with Apache Spark for data processing, machine learning and other data-heavy fun. Let's take a closer look at how we're making your life as a Data Scientist easier, as you're probably already facing tough big data challenges already, just from the influx of job offers in your inbox!

#Artificial Intelligence (AI) #Analytics #Big Data #Distributed Data Management #Java #Machine Learning (ML) #Sharding #InterSystems IRIS

2 2

0 1.7K

Article

Sergey Lukyanchikov · Apr 7, 2021 9m read

Distributed Artificial Intelligence with InterSystems IRIS

What is Distributed Artificial Intelligence (DAI)?

Attempts to find a “bullet-proof” definition have not produced result: it seems like the term is slightly “ahead of time”. Still, we can analyze semantically the term itself – deriving that distributed artificial intelligence is the same AI (see our effort to suggest an “applied” definition) though partitioned across several computers that are not clustered together (neither data-wise, nor via applications, not by providing access to particular computers in principle). I.e., ideally, distributed artificial intelligence should be arranged in such a way that none of the computers participating in that “distribution” have direct access to data nor applications of another computer: the only alternative becomes transmission of data samples and executable scripts via “transparent” messaging. Any deviations from that ideal should lead to an advent of “partially distributed artificial intelligence” – an example being distributed data with a central application server. Or its inverse. One way or the other, we obtain as a result a set of “federated” models (i.e., either models trained each on their own data sources, or each trained by their own algorithms, or “both at once”).

#Artificial Intelligence (AI) #Cloud #Convergent Analytics #Distributed Data Management #Machine Learning (ML) #InterSystems IRIS

2 0

1 668