#Monitoring

5 Followers · 182 Posts

Monitoring is a process of controlling and management of performance and availability of software applications.

Article David Loveluck · Nov 8, 2017 2m read

Application Performance Monitoring

Tools in InterSystems technology

Back in August in preparation for Global Summit I published a brief explanation of Application Performance Management (APM). To follow up on that I have written and will be publishing over the coming weeks a series of articles on APM.

One major element of APM is the construction of a historic record of application activity, performance and resource usage. Crucially for APM the measurement starts with the application and what users are doing with the application.

1
0 681
Article David Loveluck · Nov 8, 2017 5m read

Using the CSP Page Statistics

Application Performance Management

Introduction

A key part of Application Performance Management (APM) is recording the activity and performance of user activity. For many web applications the closest you can get to this is to record the CSP pages or CSP based services being dispatched.

If the pages or service names are meaningful and they indicate the business activity being performed the CSP page statistics can be very useful in building up a historical record of activity, performance and resource usage.

1
1 789
Question Jon Astle · Nov 9, 2017

I have Ensemble/Healthshare running in a production environment which is setup with a mirror failover and an arbiter sitting between them.

In the event of a failover we have a number of connections that need stopping/monitoring and starting in a certain order.

Is there a programmatic way we can detect the failover and stop certain services and operations immediately and then start them up again in the required order, checking their connection state before starting the next connection.

I am thinking Ens.Director is probably what I need however I need some guidance on how to implement a solution.

2
0 650
Question Hans Rietveld · Aug 29, 2017
Caché Version String: Cache for UNIX (Red Hat Enterprise Linux for x86-64) 2016.2.1

We have a mirrored Ensemble system (110,  backup and 210, primary). At one time (14:00) there is a disruption in the production. The messages are not being processed. 

Looking at the pButtons (every 10 seconds) I see the following abnormal at the WDphase


and the backup

The different values of WDphase are:

0: Idle (WD is not running)

5: WD is updating the Write Image Journal (WIJ) file.

7: WD is committing WIJ and Journal.

8: Databases are being updated.

3
0 926
Question Shawn McCartt · Aug 16, 2017

Is it possible to dynamically adjust the RetryInterval andFailureTimeout settings in a BPL?

I've got a business process that calls a web service operation to get a session ID from an external system.  There is a string property returned in the body of the response that indicate an exception occurred in the external system. I have code in the BPL that examines the property and sets the status property to an error status when that occurs.

Depending on what the value is I want to adjust the RetryInterval and FailureTimeout values used in by the system when the ReplyCodeActions is set to E=RD.

3
0 884
Question Mack Altman · Aug 11, 2017

Please excuse my ignorance. I am trying to identify what areas would be best to review in the System Dashboard (for Cache 2010.2) for performance issues with the database. It seems to be running slower than usual, but I am trying to find out the best way to go about identifying what the issue is.

The following are captures from the System Dashboard.

As always, thanks a lot for your help.

System Dashboard

3
0 658
Question Alexey Maslov · May 11, 2017

Since most of our customers moved to Caché 2015.1, some admins became abused with CPUPct warnings (sometimes alerts) in console log without other signs of lacking CPU power.
Documentation states that:

          CPUPct               job_type              CPU usage (percent) by all processes of the listed job type in aggregate       

What does it really mean?
E.g., if total system CPU usage is 25%, and all running processes are of the same type (e.g, CSPSRV), would CPUPct be equal to 100%? If so, why this case should be a reason for alert?

4
0 794
Article Michael Brady · May 4, 2017 4m read

Hi Everyone,

Link to webinar recording: https://learning.intersystems.com/course/view.php?id=623

Thank you for your interest in this webinar!

Please submit questions about the content of the webinar as comments below this article.

The webinar covers two topics: monitoring message activity and volume and monitoring disk space. I summarize and motivate the topics in the following two sections. The information in this webinar and post applies to the Ensemble, Health Connect, and HealthShare products. For simplicity, I am only going to refer to Ensemble in this post.

2
1 697
Question Stephen Wilson · Apr 5, 2017

I know there's a whole chapter on the subject but I would love a super simple video demo or sample configuration or training course. The myriad menu of options and unfamiliar prompts can make it a bit daunting. The challenge is simple. Send an email notification if the license usage exceeds n% LU consumption.  Why? A recent software change seemed to be responsible for causing the LU total consumption to reach 100%. That means users can't get logged in and support staff can't access the System Management Portal. A pretty daunting situation I am sure you all would agree.

3
0 568
Article Jean-Luc Delporte · Mar 27, 2017 2m read

Hi Community,

This article assumes that you are familiar with Zabbix and SNMP monitoring, if not, there are some very interesting posts on the Community, especially this one (https://community.intersystems.com/post/creating-custom-snmp-oids) which contains a lot of information on how to configure and request an SNMP Cache server.  

With Cache, all MIB data objects are organized into tables, so you need to know the index of an object to be able to get its value.

0
0 7596
Article Carter Tiernan · Dec 22, 2016 2m read

> Customizable System Monitoring. ## Introduction The Polymetric Dashboard is a stand-alone module that provides enhanced monitoring tools for a Caché environment. Equipped with over one hundred sensors that monitor key system metrics, a robust REST API, and a modular AngularJS user interface, the Polymetric Dashboard is fully functional out of the box. However, the Polymetric Dashboard is designed to be customizable; any system metric can be monitored by creating a new sensor, and the visualization of collected data can be tailored to specific requirements and purposes.

20
1 1761
Article Fabian Haupt · Feb 10, 2017 6m read

In last week's discussion we created a simple graph based on the data input from one file. Now, as we all know, sometimes we have multiple different datafiles to parse and correlate. So this week we are going to load additional perfmon data and learn how to plot that into the same graph. Since we might want to use our generated graphs in reports or on a webpage, we'll also look into ways to export the generated graphs.

Loading windows perfmon data

The perfmon data extracted from standard pbuttons report is a bit of a peculiar data format. On first glance it is a pretty straightforward csv file.

0
0 1145
Question Mack Altman · Jan 21, 2017

Can someone direct me to where in the documentation we can find how consumption may be calculated for global storage?

Caché Version 2010.1
Operating System HP OpenVMS 8.4

EDIT: After receiving some responses, it seems I was unclear in my initial inquiry. I am looking to determine our rate of consumption of storage; however, I am having some difficulty in doing that.

While utilizing ^%GSIZE, which is used by the %GlobalEdit class, the results appeared odd. I have provided my results below, which illustrate the global structure on the left and the usage indicated by ^%GSIZE on the right.

6
0 927
Question Laura Cavanaugh · Jan 20, 2017

My group needs to be able to monitor items / tasks, and let a non-management-portal user see the monitoring.  Is it possible to run DeepSee queries on Production items?  I feel like I should not be recreating the production environment or the task manager just so that I can query on the items that are running, and on their states (like "successful" or "send email").

Also, I need to log custom events for each task, and I'm running into difficulties with the task manager in this regard; hence the question about using the Production instead, but querying it.

Thanks,

Laura

1
0 408
Question Kevin Mayfield · Nov 19, 2016

Internally we use splunk for monitoring applications and network.

Does Ensemble have a way of exposing internal metrics and/or a way of exposing custom built metrics? 

I've used Deepsee dashboards in the past to monitor Apache Tomcat/Apache Camel/hawtio using JMX rest calls. This is the other way around and ideally I'd like to expose metrics on:

  • HL7v2 messages (broken down into types)
  • Production performance
  • Error and Warning.

Understand Ensemble 2016.2 includes Java 8 JVM and was wondering if the JMX route (plus hawtio) is the way to do this?

3
0 1323
Article Cindy Olsen · Nov 8, 2016 7m read

In this post I would like to talk about the syslog table.  I will cover what it is, how you look at it, what the entries really are, and why it may be important to you.  The syslog table can contain important diagnostic information.  If your system is having any problems, it is important to understand how to look at this table and what information is contained there.

1
2 2997
Question Tirthankar Bachhar · Nov 4, 2016

As per the documentation of QueueCountAlert:
Number of messages on this item's queue needed to trigger an Alert message to be sent. Note that no further alerts will be sent unless the number of messages on the queue drops below 80% of this number and then rises again to this number.
Note that this alert will be sent even if AlertOnError is False.
Zero means no alerts of this type will be sent.
Now, the question is,
If QueueCountAlert is set to 10, and the queue size become 11 we will be getting email once.

6
0 608
Question Tirthankar Bachhar · Oct 20, 2016

Hi,

When we write unit test cases for cache object script code using %UnitTest.TestCase, what  is the best way to write code to identify code coverage?

So, let say my unit test case hit all 10 lines of code of a method for a given class. So, unit test coverage should be 100% for that. But, using line-by-line coverage [(%Monitor.System.LineByLine] getting wrong percentage, because it also includes code comment/documentation as part of code. So, practically we can not ever achieve 100% of code coverage by using this API.

I'm not sure, if am able to describe the problem properly here.

2
0 962
Question Scott Roth · May 9, 2016

At the Global Summit several folks had mention that they developed their own production monitor. I am looking to create a monitor similar to eGate that we only display those Services/Processes/Operations that are in trouble, and those Errors that are showing up in the Event Log. Does anyone have any examples of this?

Thanks

Scott Roth

The Ohio State University Wexner Medical Center

3
1 735
Question Greg Billington · Jul 14, 2016

I am looking for a database management tool I would have expected to find something like on the SMP website

Aim

show current database usage (ie size allocation) by database then table etc and allow continued drill down,

show information as a table, so can then sort by size to find the biggest item easily

also show it graphically

And then have ability to track and trend growth in size over time

identify a normal growth pattern

alert if variation (higher or lower) from normal based on recent trend

is there a tool that can do this, or a 3rd party tool, even it can only operate via standard SQL JDBC/ODBC to get a partial view.

2
0 1250
Article Andrew Neilson · Jun 9, 2016 1m read

First post!   In order to somewhat redeem myself for an unnecessary call to support,  I've decided to post some classes that I've written to monitor certain metrics inside our Ensemble Live instance (yeah, Kyle, you WERE laughing at me, but it's okay).  What the classes do is to run queries and code to get database sizes, status of the mirror, counts of rows in tables such as EnsLib.HL7.Message and Ens.MessageHeader.  The data is collected and written to tables and then an email is sent out daily upon completion.  I've found this quite useful in keeping an eye on what's going on.

5
1 1029
Question Scott Beeson · Jun 27, 2016

We have multiple implementations spanning many namespaces and edges.  I would like to see if I could identify a single place, perhaps on HSREGISTRY or HSBUS, that I could capture certain events like searches (from all customers) and record transfers (with requester and provider).  

The goal is to have a dashboard that would show simple stats such as searches by participant, records shared by participant and records consumed by participant.  These are the 3 most important.

I appreciated the feedback on the other question of "how" but now I'm hoping to find the "Where".  

1
0 445
Article Barry Cooper · Apr 8, 2016 1m read

Presenter: Barry Cooper
Task: Enable users to perform analytics within an application and take actions based on those analytics
Approach: Provide examples of embedding DeepSee within applications
 

Analytics is more than just using data to provide insight. Analytics is about taking action on that insight. See examples of how you can embed DeepSee in your applications, allowing you to take action.

Content related to this session, including slides, video and additional learning content can be found here.

0
0 203
Article Luca Ravazzolo · Apr 7, 2016 1m read

Presenter: Luca Ravazzolo
Task: Track the status and performance of clustered environments
Approach: Give examples of using modern technology to spot potential bottlenecks before they turn into problems
 

This session will discuss how modern technology can be used to keep track of the status and performance of your cloud clustered environments.

Content related to this session, including slides, video and additional learning content can be found here.

0
0 315