#Monitoring

5 Followers · 180 Posts

Monitoring is a process of controlling and management of performance and availability of software applications.

Article Mikhail Khomenko · Feb 13, 2017 14m read

This post is dedicated to the task of monitoring a Caché instance using SNMP. Some users of Caché are probably doing it already in some way or another. Monitoring via SNMP has been supported by the standard Caché package for a long time now, but not all the necessary parameters are available “out of the box”. For example, it would be nice to monitor the number of CSP sessions, get detailed information about the use of the license, particular KPI’s of the system being used and such. After reading this article, you will know how to add your parameters to Caché monitoring using SNMP.

14
3 11916
Article Jean-Luc Delporte · Mar 27, 2017 2m read

Hi Community,

This article assumes that you are familiar with Zabbix and SNMP monitoring, if not, there are some very interesting posts on the Community, especially this one (https://community.intersystems.com/post/creating-custom-snmp-oids) which contains a lot of information on how to configure and request an SNMP Cache server.  

With Cache, all MIB data objects are organized into tables, so you need to know the index of an object to be able to get its value.

0
0 7586
Article Suriya Narayanan Suriya Narayanan Vadivel Murugan · Nov 12, 2016 5m read

In this article, we will discuss about Orphaned Messages.

What is an Orphaned Message

Every message body is associated with a message Header which holds the metadata. The Header holds information like source configuration name, target configuration name, time created, time processed, associated message body reference, session information, message body class name, message status. When there are message body records that do not have their corresponding Header records those are called Orphan message bodies. We will discuss possible causes which could end up with orphan message bodies.

5
8 4832
Article Murray Oldfield · Apr 27, 2016 11m read

InterSystems Data Platforms and performance - Part 5 Monitoring with SNMP

In previous posts I have shown how it is possible to collect historical performance metrics using pButtons. I go to pButtons first because I know it is installed with every Data Platforms instance (Ensemble, Caché, …). However there are other ways to collect, process and display Caché performance metrics in real time either for simple monitoring or more importantly for much more sophisticated operational analytics and capacity planning.

8
2 4675
Article Mikhail Khomenko · May 15, 2017 12m read

Prometheus is one of the monitoring systems adapted for collecting time series data.

Its installation and initial configuration are relatively easy. The system has a built-in graphic subsystem called PromDashfor visualizing data, but developers recommend using a free third-party product called Grafana. Prometheus can monitor a lot of things (hardware, containers, various DBMS's), but in this article, I would like to take a look at the monitoring of a Caché instance (to be exact, it will be an Ensemble instance, but the metrics will be from Caché). If you are interested – read along.

9
5 4596
Question Sergey Pavlov · Sep 3, 2021

UPDATE:
It turns out it was just me being a dummy, and the snmpd was correctly telling me there is no value associated with that exact key. I should have used snmpwalk instead of snmpget to display the whole tree.
Original Post follows:
Hello!
I'm trying to set up SNMP monitoring on Caché, using documentation and this article
I'm running net-snmp on Red Hat Enterprise Linux Server release 7.3 (with CentOS repositories), and Caché version 2017.1
It looks like snmpd is running as AgentX master, and Caché subagent is running too
/opt/cache/mgr/SNMP.

1
0 4591
Article Mikhail Khomenko · Aug 16, 2017 20m read

Hello! This article continues the article "Making Prometheus Monitoring for InterSystems Caché". We will take a look at one way of visualizing the results of the work of the ^mgstat tool. This tool provides the statistics of Caché performance, and specifically the number of calls for globals and routines (local and over ECP), the length of the write daemon’s queue, the number of blocks saved to the disk and read from it, amount of ECP traffic and more. ^mgstat can be launched separately (interactively or by a job), and in parallel with another performance measurement tool, ^pButtons.

10
4 3313
Article Cindy Olsen · Nov 8, 2016 7m read

In this post I would like to talk about the syslog table.  I will cover what it is, how you look at it, what the entries really are, and why it may be important to you.  The syslog table can contain important diagnostic information.  If your system is having any problems, it is important to understand how to look at this table and what information is contained there.

1
2 2988
Article Chad Severtson · Apr 12, 2023 8m read

Spoilers: Daily Integrity Checks are not only a best practice, but they also provide a snapshot of global sizes and density. 
Update 2024-04-16:
  As of IRIS 2024.1, Many of the below utilities now offer a mode to estimate the size with <2% error on average with orders of magnitude improvements in performance and IO requirements. I continue to urge regular Integrity Checks, however there are situations where more urgent answers are needed.

  • EstimatedSize^%GSIZE- Runs %GSIZE in estimation mode.   
  • ##class(%Library.GlobalEdit).GetGlobalSize(directory, globalname, .allocated, .used.
5
5 2458
Article Henrique Dias · Aug 20, 2019 2m read

Hi, everyone!


I want to share a personal project that started with a simple request at work: 

Is it's possible to know how many Caché licenses we are using? 

Reading other articles here in the community, I found this excellent article by  David Loveluck 


APM - Using the Caché History Monitor
https://community.intersystems.com/post/apm-using-cach%C3%A9-history-monitor

So, using David's article, I started using Caché History Monitor and to show all that information. 

When facing the question: Which cool tech should I use?

24
8 2443
Article Murray Oldfield · Nov 14, 2019 6m read

Released with no formal announcement in IRIS preview release 2019.4 is the /api/monitor service exposing IRIS metrics in Prometheus format. Big news for anyone wanting to use IRIS metrics as part of their monitoring and alerting solution. The API is a component of the new IRIS System Alerting and Monitoring (SAM) solution that will be released in an upcoming version of IRIS.

However, you do not have to wait for SAM to start planning and trialling this API to monitor your IRIS instances.

2
7 2279
Article Murray Oldfield · Feb 20, 2017 3m read

Note (October 2022): yape has been deprecated and replaced by YASPE, there is no more development on yape.


Note (June 2019): A lot has changed, for the latest details go here

Note (Sept 2018): There have been big changes since this post first appeared, I suggest using the Docker Container version, the project and details for running as a container are still in the same place  published on GitHub so you can download, run - and modify if you need to.

5
2 2070
Article David Loveluck · Nov 20, 2017 5m read

APM normally focuses on the activity of the application but gathering information about system usage gives you important background information that helps understand and manage the performance of your application so I am including the IRIS History Monitor in this series.

In this article I will briefly describe how you start the IRIS or Caché History Monitor to build a record of the system level activity to go with the application activity and performance information you gather. I will also give examples of SQL to access the information.

What is the IRIS or Caché History Monitor?

4
3 1994
Article David Loveluck · Feb 25, 2019 4m read

There have been some very helpful articles in the community that show how to use Grafana with IRIS (or Cache/Ensemble) by using an intermediate database.

But I wanted to get at IRIS structures directly. In particular, i wanted to access the Cache History monitor data that is accessible by SQL as described here

https://community.intersystems.com/post/apm-using-cach%C3%A9-history-mo…

and didn't want anything between me and the data.

I already had class queries that returned the data i wanted, so i just needed to embed them in a REST class that returned JSON. I haven't included my class Grafana.

5
4 1790
Article David Loveluck · Aug 27, 2019 28m read

Since Caché 2017 the SQL engine has included new set of statistics. These record the number of times a query is executed and the time it takes to run.

This is a gold mine for anyone monitoring and trying to optimize the performance of an application that includes many SQL statements but it isn’t as easy to access the data as some people want.

This article and the associated sample code explains how to use this information and how to routinely extract a summary of daily statistics and keep a historic record of the SQL performance of your application.

What is recorded?

7
6 1779
Article Carter Tiernan · Dec 22, 2016 2m read

> Customizable System Monitoring. ## Introduction The Polymetric Dashboard is a stand-alone module that provides enhanced monitoring tools for a Caché environment. Equipped with over one hundred sensors that monitor key system metrics, a robust REST API, and a modular AngularJS user interface, the Polymetric Dashboard is fully functional out of the box. However, the Polymetric Dashboard is designed to be customizable; any system metric can be monitored by creating a new sensor, and the visualization of collected data can be tailored to specific requirements and purposes.

20
1 1753
Article Murray Oldfield · Nov 18, 2019 8m read

The following steps show you how to display a sample list of metrics available from the /api/monitor service.

In the last post, I gave an overview of the service that exposes IRIS metrics in Prometheus format. The post shows how to set up and run IRIS preview release 2019.4 in a container and then list the metrics.


This post assumes you have Docker installed. If not, go and do that now for your platform :)


Step 1. Download and run the IRIS preview in docker

Follow the download instructions at Preview Distributions to download the Preview Licence Key and an IRIS Docker image

10
6 1697
Article Eduard Lebedyuk · Sep 9, 2019 1m read

Just wanted to share my Zabbix template for monitoring InterSystems IRIS on Linux servers.

It monitors irisusr (configurable) memory consumption:

  • Virtual memory size
  • Percentage of real memory
  • Resident set size
  • Size of data segment
  • Size of code segment
  • Peak resident set size
  • Size of locked memory
  • Size of shared libraries
  • Peak virtual memory size
  • Size of pinned pages
  • Size of page table entries
  • Size of process code + data + stack segments
  • Size of stack segment
  • Size of swap space used

How to use:

  1. Check that you have Zabbix installed (I'm using version 4.
5
0 1649
Article Sean McKenna · Aug 26, 2016 8m read

Enterprise Monitor is a component of Ensemble and can help organizations monitor multiple productions running on different namespaces within the same instance or namespaces running on multiple instances.

Documentation can be found at:

http://docs.intersystems.com/ens20161/csp/docbook/DocBook.UI.Page.cls?KEY=EMONITOR_all#EMONITOR_enterprise

In Ensemble 2016.1 there were changes made to make this utility work with HealthShare environments.

This article will:

  • Show how to set up Enterprise Monitor for HealthShare sites
  • Show some features of Enterprise Monitor
  • Show some features of Enterprise Message Viewer

For this article, I used the following version of HealthShare:

Cache for Windows (x86-64) 2016.1 (Build 656U) Fri Mar 11 2016 17:42:42 EST [HealthShare Modules:Core:14.02.2415 + Linkage Engine:14.02.2415 + Patient Index:14.02.2415 + Clinical Viewer:14.02.2415 + Active Analytics:14.02.2415]

2
0 1612
Article Jon Sue-Ho · Nov 17, 2016 2m read

The MONITOR process (also called the Caché Monitor) scans the messages in your cconsole.log file and sends you emails based on the severity of those messages.  The MONITOR is configured using the ^MONMGR utility in terminal.

The MONITOR should not be confused with the similarly named System Monitor, which checks a variety of system health and performance metrics and can log messages regarding them to the cconsole.log, where they can then be scanned by the MONITOR.

The process begins automatically at Caché startup and scans your cconsole.

6
2 1575
Article Lorenzo Scalese · Aug 16, 2023 11m read

Hi developers!

Today I would like to address a subject that has given me a hard time. I am sure this must have been the case for quite a number of you already (so-called “the bottleneck”). Since this is a broad topic, this article will only focus on identifying incoming HTTP requests that could be causing slowness issues. I will also provide you with a small tool I have developed to help identify them.

Our software is becoming more and more complex, processing a large number of requests from different sources, be it front-end or third-party back-end applications. To ensure optimal performance, it is essential to have a logging system capable of taking a few key measurements, such as the response time, the number of global references and the number of lines of code executed for each HTTP response. As part of my work, I get involved in the development of EMR software as well as incident analysis.  Since user load comes mostly from HTTP requests (REST API or CSP application), the need to have this type of measurement when generalized slowness issues occur has become obvious.

5
9 1563
Article Tani Frankel · May 17, 2016 1m read

One of the topics that comes up often when managing Ensemble productions is disk space:

The database (the CACHE.DAT file) grows in a rate that was unexpected; or the Journal files build up at a fast pace; or the database grows continuously though the system has a scheduled purge of the Ensemble runtime data.

It would have been better if these kind of phenomena would have been observed and accounted for yet at the development and testing stage rather than on a live system.

For this purpose I created a basic framework that could aid in this task.

7
2 1537
Announcement Shane Nowack · May 23, 2022

Hello IRIS Community,

InterSystems Certification is developing a certification exam for IRIS system administrators and, if you match the exam candidate description given below, we would like you to beta test the exam. The exam will be available for beta testing on June 20-23, 2022 at InterSystems Global Summit 2022, but only for Summit registrants (visit this page to learn more about Certification at GS22) . Beta testing will open for all other interested beta testers on July 18, 2022. However, interested beta testers should sign up now by emailing certification@intersystems.com

0
4 1523
Question Glenn van Bavel · May 4, 2017

Whenever the Windows SNMP Service restarts, the snmpdbg log says the following. 

13:08:59 :Attempting initial TCP connection(s) with 1 Cache instances ...
13:08:59 :Get connection with ENSEMBLE on port 1972
13:08:59 :Connection refused on port 1972, check if Cache instance ENSEMBLE is started.
13:08:59 :Cache iscsnmp.dll initialized for 1 configs

Ensemble and all productions are running. I've set up Caché SNMP agent on many other servers in our company and those are working fine. However this one server won't budge. 

Does anyone have any idea what the problem may be here? 

Regards,

Glenn

10
0 1383
Article Eduard Lebedyuk · Feb 9, 2024 6m read

Welcome to the next chapter of my CI/CD series, where we discuss possible approaches toward software development with InterSystems technologies and GitLab. Today, we continue talking about Interoperability, specifically monitoring your Interoperability deployments. If you haven't yet, set up Alerting for all your Interoperability productions to get alerts about errors and production state in general.

Inactivity Timeout is a setting common to all Interoperability Business Hosts. A business host has an Inactive status after it has not received any messages within the number of seconds specified by the Inactivity Timeout field. The production Monitor Service periodically reviews the status of business services and business operations within the production and marks the item as Inactive if it has not done anything within the Inactivity Timeout period. The default value is 0 (zero). If this setting is 0, the business host will never be marked Inactive, no matter how long it stands idle.

1
0 1367
Article David Loveluck · Dec 15, 2017 9m read

practical guide to using the tools PERFMON and MONLBL.

Introduction

When investigating performance problems, I often use the utilities ^PERFMON and ^%SYS.MONLBL to identify exactly where in the application pieces of code are taking a long time to execute. In this short paper I will describe an approach that first uses ^PERFMON to identify the busiest routines and then uses ^%SYS.MONLBL to analyze those routines in detail to show which lines are the most expensive.

The details of ^PERFMON and ^%SYS.

6
1 1322
Question Kevin Mayfield · Nov 19, 2016

Internally we use splunk for monitoring applications and network.

Does Ensemble have a way of exposing internal metrics and/or a way of exposing custom built metrics? 

I've used Deepsee dashboards in the past to monitor Apache Tomcat/Apache Camel/hawtio using JMX rest calls. This is the other way around and ideally I'd like to expose metrics on:

  • HL7v2 messages (broken down into types)
  • Production performance
  • Error and Warning.

Understand Ensemble 2016.2 includes Java 8 JVM and was wondering if the JMX route (plus hawtio) is the way to do this?

3
0 1318