The rise of Big Data projects, real-time self-service analytics, online query services, and social networks, among others, have enabled scenarios for massive and high-performance data queries. In response to this challenge, MPP (massively parallel processing database) technology was created, and it quickly established itself. Among the open-source MPP options, Presto (https://prestodb.io/) is the best-known option. It originated in Facebook and was utilized for data analytics, but later became open-sourced.

6 0
3 232
Article
· Jul 7, 2017 19m read
Indexing of non-atomic attributes

Quotes (1NF/2NF/3NF)ru:

Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
The same value can be atomic or non-atomic depending on the purpose of this value. For example, “4286” can be
  • atomic, if its denotes “a credit card’s PIN code” (if it’s broken down or reshuffled, it is of no use any longer)
  • non-atomic, if it’s just a “sequence of numbers” (the value still makes sense if broken down into several parts or reshuffled)

This article explores the standard methods of increasing the performance of SQL queries involving the following types of fields: string, date, simple list (in the $LB format), "list of <...>" and "array of <...>".

7 0
0 1.1K

High-Performance Message Searching in Health Connect

The Problem

Have you ever tried to do a search in Message Viewer on a busy interface and had the query time out? This can become quite a problem as the amount of data increases. For context, the instance of Health Connect I am working with does roughly 155 million Message Headers per day with 21 day message retention. To try and help with search performance, we extended the built-in SearchTable with commonly used fields in hopes that indexing these fields would result in faster query times. Despite this, we still couldn't get some of these queries to finish at all.

17 0
6 110
Article
· Jul 26, 2017 3m read
What is APM?

What is APM?

I am talking about Application Performance Management at global summit, and several people have asked what that means so it is time for a bit of an explanation.

APM or Application Performance Management (sometimes referred to as Application Performance Monitoring) has a very good (if complicated) explanation on Wikipedia but to me it just means looking at performance from the users’ point of view and the level of service provided to them.

3 0
1 743
Article
· Sep 30, 2016 1m read
ECP Magic

I saw someone recently refer to ECP as magic. It certainly seems so, and there is a lot of very clever engineering to make it work. But the following sequence of diagrams is a simple view of how data is retrieved and used across a distributed architecture.

For more more on ECP including capacity planning follow this link: Data Platforms and Performance - Part 7 ECP for performance, scalability and availability

10 0
0 1.3K

Most transactional applications have a 70:30 RW profile. However, some special cases have extremely high write IO profiles.

I ran storage IO tests in the ap-southeast-2 (Sydney) AWS region to simulate IRIS database IO patterns and throughput similar to a very high write rate application.

The test aimed to determine whether the EC2 instance types and EBS volume types available in the AWS Australian regions will support the high IO rates and throughput required.

5 0
0 1.4K

Introduction

Database performance has become a critical success factor in a modern application environment. Therefore identifying and optimizing the most resource-intensive SQL queries is essential for guaranteeing a smooth user experience and maintaining application stability.

This article will explore a quick approach to analyzing SQL query execution statistics on an InterSystems IRIS instance to identify areas for optimization within a macro-application.

Rather than focusing on real-time monitoring, we will set up a system that collects and analyzes statistics pre-calculated by IRIS once an hour. This approach, while not enabling instantaneous monitoring, offers an excellent compromise between the wealth of data available and the simplicity of implementation.

We will use Grafana for data visualization and analysis, InfluxDB for time series storage, and Telegraf for metrics collection. These tools, recognized for their power and flexibility, will allow us to obtain a clear and exploitable view.

More specifically, we will detail the configuration of Telegraf to retrieve statistics. We will also set up the integration with InfluxDB for data storage and analysis, and create customized dashboards in Grafana. This will help us quickly identify queries requiring special attention.

To facilitate the orchestration and deployment of these various components, we will employ Docker.

logos.png

6 0
3 171

It has been noticed that some customers running JAVA programs (for example, FOP) on AIX would see the server eventually running low then out of memory. Customer would notice the system pages heavily and user experience becomes bad. And the server would crash when out of memory.

When the problem happens, we can see in ipcs a lot of shared memory segment marked for deletion (Capital D at the beginning of MODE section). This means they will not disappear until the last process attached to the segment detaches it.

5 0
2 1.9K

Windows Subsystem for Linux (WSL) is a feature of Windows that allows you to run a Linux environment on your Windows machine, without the need for a separate virtual machine or dual booting.

WSL is designed to provide a seamless and productive experience for developers who want to use both Windows and Linux at the same time**.

2 0
1 396

What is %SQLRESTRICT

%SQLRESTRICT is a special %FILTER clause for use in MDX queries in InterSystems IRIS Business Intelligence. Since this function begins with %, it means this is a special MDX extension created by InterSystems. It allows users to insert an SQL statement that will be used to restrict the returned records in the MDX Result Set. This SQL statement must return a set of Source Record IDs to limit the results by. Please see the documentation for more information.

Why is this useful?

This is useful because there are often times users want to restrict the results in their MDX Result Set based on information that is not in their cubes. It may be the case that this information may not make sense to be in the cube. Other times this can be useful when there is a large set of values you want to restrict. As mentioned before, this is not a standard MDX function, it was created by InterSystems to handle cases were queries were not performing well or cases that were not easily solved by existing functions.

6 0
2 683

Index

This is a list of all the posts in the Data Platforms’ capacity planning and performance series in order. Also a general list of my other posts. I will update as new posts in the series are added.


You will notice that I wrote some posts before IRIS was released and refer to Caché. I will revisit the posts over time, but in the meantime, Generally, the advice for configuration is the same for Caché and IRIS. Some command names may have changed; the most obvious example is that anywhere you see the ^pButtons command, you can replace it with ^SystemPerformance.


While some posts are updated to preserve links, others will be marked as strikethrough to indicate that the post is legacy. Generally, I will say, "See: some other post" if it is appropriate.


Capacity Planning and Performance Series

Generally, posts build on previous ones, but you can also just dive into subjects that look interesting.


16 0
7 6.3K

Often times support and sales engineers are asked about recent benchmark results on various platforms and large scale configurations. These will be made available here in the Developer Community in the "Documentation" section, and as an example here's a link to a recent Intel E7 v2 series processor benchmark.

https://community.intersystems.com/documentation/data-scalability-intersystems-caché-and-intel-processors-0

0 0
0 325

Often InterSystems technology architect team is asked about recommended storage arrays or storage technologies. To provide this information to a wider audience as reference, a new series is started to provide some of the results we have encountered with various storage technologies. As a general recommendation, all-flash storage is highly recommended with all InterSystems products to provide the lowest latency and predictable IOPS capabilities.

The first in the series was the most recently tested Netapp AFF A300 storage array. This is middle-tier type storage array with several higher models above it. This specific A300 model is capable of supporting a minimal configuration of only a few drives to hundreds of drives per HA pair, and also capable of being clustered with multiple controller pairs for tens of PB's of disk capacity and hundreds of thousands of IOPS or higher.

3 0
0 3.4K