Performance

Syndicate content 11 

IRIS and Ensemble are designed to act as an ESB/EAI. This mean they are build to process lots of small messages.

But some times, in real life we have to use them as ETL. The down side is not that they can't do so, but it can take a long time to process millions of row at once.

To improve performance, I have created a new SQLOutboundAdaptor who only works with JDBC.

BatchSqlOutboundAdapter

Extend EnsLib.SQL.OutboundAdapter to add batch batch and fetch support on JDBC connection.

Benchmark

Benchmarks released on Postgres 11.2 with 1 000 000 rows fetched and 100 000 rows inserted on 2 columns.

alt text

Prerequisites

Can be used on IRIS or Ensemble 2017.2+.

Installing

Clone this repositor

0   0 1
0

comments

45

views

0

rating

ˮ This is one of my articles which was never published in English. Let's fix it!

Hello! This article is about quite a practical way of developing InterSystems solutions without using the integrated tools like Studio or Atelier. All the code of the project can be stored in the form of "traditional" source code files, edited in your favorite development environment (for example, Visual Studio Code), indexed by any version control system and arbitrarily combined with many external tools for code analysis, preprocessing, packaging and so on

Last comment 2 March 2019
+ 6   5 3
719

views

+ 6

rating

There are often questions surrounding the ideal Apache HTTPD Web Server configuration for HealthShare.  The contents of this article will outline the initial recommended web server configuration for any HealthShare product. 

As a starting point, Apache HTTPD version 2.4.x (64-bit) is recommended.  Earlier versions such as 2.2.x are available, however version 2.2 is not recommended for performance and scalability of HealthShare.

+ 12   5 3
0

comments

404

views

+ 12

rating

There are three things most important to any SQL performance conversation:  Indices, TuneTable, and Show Plan.  The attached PDFs includes historical presentations on these topics that cover the basics of these 3 things in one place.  Our documentation provides more detail on these and other SQL Performance topics in the links below.  The eLearning options reinforces several of these topics.  In addition, there are several Developer Community articles which touch on SQL performance, and those relevant links are also listed.

There is a fair amount of repetition in the information listed below.  The most important aspects of SQL performance to consider are:

  1. The types of indices available
  2. Using one index type over another
  3. The information TuneTable gathers for a table and what it means to the Optimizer
  4. How to read a Show Plan to better understand if a query is good or bad
Last comment 18 January 2019
+ 5   2 2
240

views

+ 5

rating

Not everyone knows that InterSystems Caché has a built-in tool for code profiling called Caché Monitor.

Its main purpose (obviously) is the collection of statistics for programs running in Caché. It can provide statistics by program, as well as detailed Line-by-Line statistics for each program.

Using Caché Monitor

Let’s take a look at a potential use case for Caché Monitor and its key features. So, in order to start the profiler, you need to go to the terminal and switch to the namespace that you want to monitor, then launch the %SYS.MONLBL system routine:

zn "<namespace>"
do ^%SYS.MONLBL

As the result, you will see the following:

Last comment 14 December 2018
+ 3   7 2
365

views

+ 3

rating

Continuing on with providing some examples of various storage technologies and their performance profiles, this time we looked at the growing trend of leveraging internal commodity-based server storage, specifically the new HPE Cloudline 3150 Gen10 AMD processor-based single socket servers with two 3.2TB Samsung  PM1725a NVMe drives.  

Last comment 29 October 2018
+ 4   0 2
330

views

+ 4

rating

Often InterSystems technology architect team is asked about recommended storage arrays or storage technologies.  To provide this information to a wider audience as reference, a new series is started to provide some of the results we have encountered with various storage technologies.  As a general recommendation, all-flash storage is highly recommended with all InterSystems products to provide the lowest latency and predictable IOPS capabilities.

The first in the series was the most recently tested Netapp AFF A300 storage array.  This is middle-tier type storage array with several higher models above it.  This specific A300 model is capable of supporting a minimal configuration of only a few drives to hundreds of drives per HA pair, and also capable of being clustered with multiple controller pairs for tens of PB's of disk capacity and hundreds of thousands of IOPS or higher. 

+ 3   0 1
0

comments

1039

views

+ 3

rating

In the last post we scheduled 24-hour collections of performance metrics using pButtons. In this post we are going to be looking at a few of the key metrics that are being collected and how they relate to the underlying system hardware. We will also start to explore the relationship between Caché (or any of the InterSystems Data Platforms) metrics and system metrics. And how you can use these metrics to understand the daily beat rate of your systems and diagnose performance problems.

Last comment 24 March 2018
+ 15   0 16
2013

views

+ 15

rating

No doubt bitmap indexing, if used with a suitable property, performs just impressive!
But there is a major limit: ID key has to be a positive integer.
For modern class definitions working with CacheStorage this is a default.

BUT: There are hundreds (thousands ?) old applications out in the field that
are still using composite ID keys.
Or - to name the origin - work on Globals with 2 subscript levels (or more).
They are by construction excluded from our "Bitmap Wonderland".

Of course, using the feature of multiple storage maps could possibly allow to escape - somehow.
But in those cases where you have many many GB of data stored in that way and
where you work with millions of lines of old  - often poorly documented - code.
You wouldn't think seriously more than 5 min. if you change the storage structure or not.
So the bitmap is not with you

Last comment 17 February 2018
+ 7   0 5
357

views

+ 7

rating

APM normally focuses on the activity of the application but gathering information about system usage gives you important background information that helps understand and manage the performance of your application so I am including the Caché History Monitor in this series.

In this article I will briefly describe how you start the Caché History Monitor to build a record of the system level activity to go with the application activity and performance information you gather. I will also give examples of SQL to access the information.

What is the Caché History Monitor?

The Caché History Monitor is an extension of the Caché System Monitor. It keeps a persistent record of the metrics relating to the database activity (e.g. global reads and updates) and system usage (e.g. CPU usage)

Last comment 18 January 2018
+ 2   0 2
420

views

+ 2

rating

practical guide to using the tools PERFMON and MONLBL.

Introduction

When investigating performance problems, I often use the utilities ^PERFMON and ^%SYS.MONLBL to identify exactly where in the application pieces of code are taking a long time to execute. In this short paper I will describe an approach that first uses ^PERFMON to identify the busiest routines and then uses ^%SYS.MONLBL to analyze those routines in detail to show which lines are the most expensive.

The details of ^PERFMON and ^%SYS.MONLBL are individually described in detail in the Caché documentation, but here I summarize when, why and how you can use them together to find the cause of a performance problem in an application

Last comment 29 December 2017
+ 4   0 11
547

views

+ 4

rating

Using the CSP Page Statistics

Application Performance Management

Introduction

A key part of Application Performance Management (APM) is recording the activity and performance of user activity. For many web applications the closest you can get to this is to record the CSP pages or CSP based services being dispatched.

If the pages or service names are meaningful and they indicate the business activity being performed the CSP page statistics can be very useful in building up a historical record of activity, performance and resource usage. This allows you to recognize trends in activity for planning purposes. It allows you to see gradually degrading performance or increasing resource usage so you can take action before you have a crisis. Also, if you do have a performance crisis you can look back to see what has changed since the performance was fine

Last comment 11 November 2017
+ 5   0 2
312

views

+ 5

rating

This post will show you an approach to size shared memory requirements for database applications running on InterSystems data platforms including global and routine buffers, gmheap, and locksize as well as some performance tips you should consider when configuring servers and when virtualizing Caché applications. As ever when I talk about Caché I mean all the data platform (Ensemble, HealthShare, iKnow and Caché).


A list of other posts in this series is here

Last comment 22 August 2017
+ 18   2 11
5072

views

+ 18

rating

Hyper-Converged Infrastructure (HCI) solutions have been gaining traction for the last few years with the number of deployments now increasing rapidly. IT decision makers are considering HCI when scoping new deployments or hardware refreshes especially for applications already virtualised on VMware. Reasons for choosing HCI include; dealing with a single vendor, validated interoperability between all hardware and software components, high performance especially IO, simple scalability by addition of hosts, simplified deployment and simplified management

Last comment 16 August 2017
+ 7   0 6
1804

views

+ 7

rating

In this short article we talk about how to get Yape running in a docker container to avoid having to setup python on your machine.

It's been a while since the last article in this series, so let's recap quickly.

We talked about using matplotlib to create a basic graph. Afterwards we introduced dynamic graphs using bokeh.
In the 3rd part we talked about generating heatmaps using monlbl data

Last comment 7 August 2017
+ 5   0 3
433

views

+ 5

rating

Globals, these magic swords for storing data, have been around for a while, but not many people can use them efficiently or know about this super-weapon altogether.

If you use globals for tasks where they truly shine, the results may be amazing, either in terms of increased performance or dramatic simplification of the overall solution (1, 2).

Globals offer a special way of storing and processing data, which is completely different from SQL tables. They were first introduced in 1966 in the M(UMPS) programming language, which was initially used in medical databases. It is still used in the same way, but has also been adopted by some other industries where reliability and high performance are top priorities: finance, trading, etc.

Later M(UMPS) evolved into Caché ObjectScript (COS). COS was developed by InterSystems as a superset of M. The original language is still accepted by developers' community and alive in a few implementations. There are several signs of activity around the web: MUMPS Google group, Mumps User's group), effective ISO Standard, etc.

Modern global based DBMS supports transactions, journaling, replication, partitioning. It means that they can be used for building modern, reliable and fast distributed systems.

Globals do not restrict you to the boundaries of the relational model. They give you the freedom of creating data structures optimized for particular tasks. For many applications reasonable use of globals can be a real silver bullet offering speeds that developers of conventional relational applications can only dream of.

Globals as a method of storing data can be used in many modern programming languages, both high- and low-level. Therefore, this article will focus specifically on globals and not the language they once came from.

Last comment 31 July 2017
+ 12   0 4
1224

views

+ 12

rating

What is APM?

I am talking about Application Performance Management at global summit, and several people have asked what that means so it is time for a bit of an explanation.

APM or Application Performance Management (sometimes referred to as Application Performance Monitoring) has a very good (if complicated) explanation on Wikipedia but to me it just means looking at performance from the users’ point of view and the level of service provided to them.

The ‘user’ in this context could mean a line of business owner who is paying for the application or it could mean someone in front of a screen using the application to do their job. These users aren’t really interested in CPU usage, IOPS or buffer cache efficiency.  And if they are talking about some technical detail of the application it is nearly always bad news, because it is the thing that is making their lives hard

+ 3   0 2
0

comments

227

views

+ 3

rating

In the previous parts (1, 2) we talked about globals as trees. In this article, we will look at them as sparse arrays.

A sparse array - is a type of array where most values assume an identical value.

In practice, you will often see sparse arrays so huge that there is no point in occupying memory with identical elements. Therefore, it makes sense to organize sparse arrays in such a way that memory is not wasted on storing duplicate values.

In some programming languages, sparse arrays are part of the language - for example, in J, MATLAB. In other languages, there are special libraries that let you use them. For C++, those would be Eigen and the like.

Globals are good candidates for implementing sparse arrays for the following reasons:

Last comment 17 July 2017
+ 6   0 5
532

views

+ 6

rating

Beginning - see Part 1.

 

3. Variants of structures when using globals

 

A structure, such as an ordered tree, has various special cases. Let's take a look at those that have practical value for working with globals.

 

 

 

 

3.1 Special case 1. One node without branches

 

Last comment 8 July 2017
+ 6   0 5
739

views

+ 6

rating

Quotes (1NF/2NF/3NF)ru:

Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
The same value can be atomic or non-atomic depending on the purpose of this value. For example, “4286” can be
  • atomic, if its denotes “a credit card’s PIN code” (if it’s broken down or reshuffled, it is of no use any longer)
  • non-atomic, if it’s just a “sequence of numbers” (the value still makes sense if broken down into several parts or reshuffled)

This article explores the standard methods of increasing the performance of SQL queries involving the following types of fields: string, date, simple list (in the $LB format), "list of <...>" and "array of <...>".

+ 6   0 3
0

comments

361

views

+ 6

rating

This week I am going to look at CPU, one of the primary hardware food groups :) A customer asked me to advise on the following scenario; Their production servers are approaching end of life and its time for a hardware refresh. They are also thinking of consolidating servers by virtualising and want to right-size capacity either bare-metal or virtualized. Today we will look at CPU, in later posts I will explain the approach for right-sizing other key food groups - memory and IO.

So the questions are

Last comment 7 June 2017
+ 12   0 4
2702

views

+ 12

rating

Note (Sept 2018): There have been big changes since this post first appeared, I suggest using the Docker Container version, the project and details for running as a container are still in the same place  published on GitHub so you can download, run - and modify if you need to

Last comment 11 May 2017
+ 8   1 5
882

views

+ 8

rating

Code coverage and performance optimization of code has come up a bunch of times already, so most of you should already be aware of the SYS.MONLBL utility.
Often a visual approach to looking at code is much more intuitive than pure numbers, which is pretty much the whole point of this article series. This time we will take a slight excursion away from python and its tools and are going to explore generating heatmaps from ^%SYS.MONLBL reports.

As a quick reminder a heatmap is just a specific visualization tool, which gives us an overview of data where colors represent a certain value. In our case the data is going to be lines of code, with the time being spent in them mapped to colors

+ 3   1 0
0

comments

193

views

+ 3

rating

 It's almost time to get your customers upgraded to new versions - are you worried about showing off your SQL Performance after upgrades?  If you want to upgrade without worrying, then I have just the program for you!!!  Check out this video from Global Summit 2016 featuring yours truly explaining how to upgrade a system without worrying about pesky SQL queries showing on your waistline!  

https://www.youtube.com/watch?v=GfFPYfIoR_g

Unfortunately the video started after the Frozen Musical Sing-a-long, but it's 30 minutes of the most fun you'll have while learning tools and tips for Caché SQL!

Questions?  Comments?  You know you can leave them here!

 

TL;DW: If you're on a version older than 2016.2, then you'll need to do a lot of testing.  Once you hit 2016.2, upgrades are easy as "Hello World!"

 

 

0   0 2
0

comments

132

views

0

rating

In previous posts I have shown how it is possible to collect historical performance metrics using pButtons. I go to pButtons first because I know it is installed with every Data Platforms instance (Ensemble, Caché, …). However there are other ways to collect, process and display Caché performance metrics in real time either for simple monitoring or more importantly for much more sophisticated operational analytics and capacity planning. One of the most common methods of data collection is to use SNMP (Simple Network Management Protocol). SNMP a standard way for Caché to provide management and monitoring information to a wide variety of management tools. The Caché online documentation includes details of the interface between Caché and SNMP. While SNMP should 'just work' with Caché there are some configuration tricks and traps.

Last comment 17 March 2017
+ 9   1 2
2113

views

+ 9

rating