Paul Riker · Mar 29, 2019
Ensemble as a Data lake

We have been storing raw messages in a MySQL database for DR and ad hoc purposes. We are thinking of using an Ensemble instance as our data lake instead. We could segregate the source data by namespace or by global. But either way we'll want a custom global to index the data for data retrieval performance purposes.

Anyone else taking this approach? Any feedback?

0 2
0 353

With the release of InterSystems IRIS, we're also making available a nifty bit of software that allows you to get the best out of your InterSystems IRIS cluster when working with Apache Spark for data processing, machine learning and other data-heavy fun. Let's take a closer look at how we're making your life as a Data Scientist easier, as you're probably already facing tough big data challenges already, just from the influx of job offers in your inbox!

2 2
0 1,123

Apache Spark has rapidly become one of the most exciting technologies for big data analytics and machine learning. Spark is a general data processing engine created for use in clustered computing environments. Its heart is the Resilient Distributed Dataset (RDD) which represents a distributed, fault tolerant, collection of data that can be operated on in parallel across the nodes of a cluster. Spark is implemented using a combination of Java and Scala and so comes as a library that can run on any JVM.

11 5
0 2,380

This is the first article of a series diving into visualization tools and analysis of time series data. Obviously we are most interested in looking at performance related data we can gather from the Caché family of products. However, as we'll see down the road, we are absolutely not limited to that. For now we are exploring python and the libraries/tools available within that ecosystem.

9 4
1 1,383