#Monitoring

5 Followers · 175 Posts

Monitoring is a process of controlling and management of performance and availability of software applications.

Question Sergey Pavlov · Sep 3, 2021

UPDATE:
It turns out it was just me being a dummy, and the snmpd was correctly telling me there is no value associated with that exact key. I should have used snmpwalk instead of snmpget to display the whole tree.
Original Post follows:
Hello!
I'm trying to set up SNMP monitoring on Caché, using documentation and this article
I'm running net-snmp on Red Hat Enterprise Linux Server release 7.3 (with CentOS repositories), and Caché version 2017.1
It looks like snmpd is running as AgentX master, and Caché subagent is running too
/opt/cache/mgr/SNMP.

1
0 4581
Question Glenn van Bavel · May 4, 2017

Whenever the Windows SNMP Service restarts, the snmpdbg log says the following. 

13:08:59 :Attempting initial TCP connection(s) with 1 Cache instances ...
13:08:59 :Get connection with ENSEMBLE on port 1972
13:08:59 :Connection refused on port 1972, check if Cache instance ENSEMBLE is started.
13:08:59 :Cache iscsnmp.dll initialized for 1 configs

Ensemble and all productions are running. I've set up Caché SNMP agent on many other servers in our company and those are working fine. However this one server won't budge. 

Does anyone have any idea what the problem may be here? 

Regards,

Glenn

10
0 1379
Question Kevin Mayfield · Nov 19, 2016

Internally we use splunk for monitoring applications and network.

Does Ensemble have a way of exposing internal metrics and/or a way of exposing custom built metrics? 

I've used Deepsee dashboards in the past to monitor Apache Tomcat/Apache Camel/hawtio using JMX rest calls. This is the other way around and ideally I'd like to expose metrics on:

  • HL7v2 messages (broken down into types)
  • Production performance
  • Error and Warning.

Understand Ensemble 2016.2 includes Java 8 JVM and was wondering if the JMX route (plus hawtio) is the way to do this?

3
0 1308
Question Greg Billington · Jul 14, 2016

I am looking for a database management tool I would have expected to find something like on the SMP website

Aim

show current database usage (ie size allocation) by database then table etc and allow continued drill down,

show information as a table, so can then sort by size to find the biggest item easily

also show it graphically

And then have ability to track and trend growth in size over time

identify a normal growth pattern

alert if variation (higher or lower) from normal based on recent trend

is there a tool that can do this, or a 3rd party tool, even it can only operate via standard SQL JDBC/ODBC to get a partial view.

2
0 1239
Question Laura Blázquez García · Feb 8, 2017

Hi,

We want to monitor an Ensemble Production and send custom email alerts in function of some Rules. For example, if we normally receive 1 message per second, if suddenly we receive 5 or more messages per second, we want to send an email alert. And if tomorrow we don't want to check this again, we want to disable it through Ensemble Business Rules.

4
0 1212
Question Lucas Fernandes · Feb 26, 2018

Hi community,
I need to monitor Caché Intersystems with some custom indicators.

I started customizing the SNMP Mib. But I've been in a Zabbix event, all speakers use ODBC to monitor their database, Oracle, MySQL, PostgreSQL ...
What is the best way? Use ODBC or SNMP Custom Mib?
What are you guys using?

2
0 1186
Question Guilherme Silva · Jan 3, 2018

I want to understand how this message is build:

[SYSTEM MONITOR] CPUusage Alert: CPUusage = 99, 99, 99 (Max value is 85).

Caché keep a log of cpu usage (99,99,99) and how is the frequency of check of this?

how can i chance the max value? is that possible?

Best,

4
0 1029
Question Scott Roth · Oct 2, 2019

We are constantly running into issues where there are billions of Orphaned messages in our system that cause problems, and we have to manually run a cleanup to fix performance issues.

 In the following article about orphaned messages... https://community.intersystems.com/post/ensemble-orphaned-messages it mentions either programmatically eliminating the Orphaned messages or using a Utility like Demo.Util.CleanupSet in ENSDEMO.

7
2 1010
Question Tirthankar Bachhar · Oct 20, 2016

Hi,

When we write unit test cases for cache object script code using %UnitTest.TestCase, what  is the best way to write code to identify code coverage?

So, let say my unit test case hit all 10 lines of code of a method for a given class. So, unit test coverage should be 100% for that. But, using line-by-line coverage [(%Monitor.System.LineByLine] getting wrong percentage, because it also includes code comment/documentation as part of code. So, practically we can not ever achieve 100% of code coverage by using this API.

I'm not sure, if am able to describe the problem properly here.

2
0 954
Question Scott Roth · Oct 12, 2018

In looking at the Production monitor within Ensemble, I was wondering if there is a way we could customize it for our use. I notice it is basically a dashboard.

For example I would only like to truly display those Services, Processes, and Operations that are truly in dire need of attention. The Monitor out of the box just seems too busy, and I would like to simplify it.

I was trying to find a sample how a Monitor Dashboard would be setup, but I am not seeing anything in ENSDEMO, or SAMPLES. Has anyone created a Custom Dashboard/Monitor for their purposes?

2
3 938
Question Erin Dolson · Jul 16, 2019

Hi all,

I'm looking to set up monitoring for several interfaces. I understand that I can set an Inactivity Timeout. However, obviously there are messages coming through more frequently during certain hours than other hours. 

Is there a way to set an Inactivity Timeout for each hour of the day instead of one value that is used all day long? 

Best,

Erin

12
3 921
Question Mack Altman · Jan 21, 2017

Can someone direct me to where in the documentation we can find how consumption may be calculated for global storage?

Caché Version 2010.1
Operating System HP OpenVMS 8.4

EDIT: After receiving some responses, it seems I was unclear in my initial inquiry. I am looking to determine our rate of consumption of storage; however, I am having some difficulty in doing that.

While utilizing ^%GSIZE, which is used by the %GlobalEdit class, the results appeared odd. I have provided my results below, which illustrate the global structure on the left and the usage indicated by ^%GSIZE on the right.

6
0 918
Question Hans Rietveld · Aug 29, 2017
Caché Version String: Cache for UNIX (Red Hat Enterprise Linux for x86-64) 2016.2.1

We have a mirrored Ensemble system (110,  backup and 210, primary). At one time (14:00) there is a disruption in the production. The messages are not being processed. 

Looking at the pButtons (every 10 seconds) I see the following abnormal at the WDphase


and the backup

The different values of WDphase are:

0: Idle (WD is not running)

5: WD is updating the Write Image Journal (WIJ) file.

7: WD is committing WIJ and Journal.

8: Databases are being updated.

3
0 916
Question Shawn McCartt · Aug 16, 2017

Is it possible to dynamically adjust the RetryInterval andFailureTimeout settings in a BPL?

I've got a business process that calls a web service operation to get a session ID from an external system.  There is a string property returned in the body of the response that indicate an exception occurred in the external system. I have code in the BPL that examines the property and sets the status property to an error status when that occurs.

Depending on what the value is I want to adjust the RetryInterval and FailureTimeout values used in by the system when the ReplyCodeActions is set to E=RD.

3
0 869
Question Murray Oldfield · Mar 19, 2021

SAM - Hacks and Tips for set up and adding metrics from non-IRIS targets

SAM (System Altering and Monitoring) comes with as a 'batteries included' docker-compose container set that is ready to start monitoring IRIS instances with a default dashboard as soon as it starts up. The initial configuration is good to understand SAM functionality and start basic monitoring of your IRIS systems. However, out of the box, there are some setting s that you will need to change when you start to monitor many systems and collect a lot of metric data.

5
0 821
Question Mary George · Sep 4, 2020

Hi, 

I created a task from Management portal  Task manager to use the Ens.Util.Tasks.Purge task . Task set up includes email notification setup for Completion email and error email.

This task is giving an error  and no email is generated: 



<CLASS DOES NOT EXIST>zSendMail+22^%SYS.TaskSuper.1 *Security.SSLConfigs

I tested all other task types available from Ens.Util.task but all are giving the same error.

Not sure if this Is this a bug or some missing configuration in the task setup ? Anyone noticed any similar issue or any idea how to fix this ? 


Thank you for your help.

Regards,

Mary

7
0 776
Question Alexey Maslov · May 11, 2017

Since most of our customers moved to Caché 2015.1, some admins became abused with CPUPct warnings (sometimes alerts) in console log without other signs of lacking CPU power.
Documentation states that:

          CPUPct               job_type              CPU usage (percent) by all processes of the listed job type in aggregate       

What does it really mean?
E.g., if total system CPU usage is 25%, and all running processes are of the same type (e.g, CSPSRV), would CPUPct be equal to 100%? If so, why this case should be a reason for alert?

4
0 776
Question Scott Roth · May 9, 2016

At the Global Summit several folks had mention that they developed their own production monitor. I am looking to create a monitor similar to eGate that we only display those Services/Processes/Operations that are in trouble, and those Errors that are showing up in the Event Log. Does anyone have any examples of this?

Thanks

Scott Roth

The Ohio State University Wexner Medical Center

3
1 723
Question Mack Altman · Aug 11, 2017

Please excuse my ignorance. I am trying to identify what areas would be best to review in the System Dashboard (for Cache 2010.2) for performance issues with the database. It seems to be running slower than usual, but I am trying to find out the best way to go about identifying what the issue is.

The following are captures from the System Dashboard.

As always, thanks a lot for your help.

System Dashboard

Global and Routine Statistics

ECP Statistics

Disk and Buffer Statistics

3
0 647
Question Jon Astle · Nov 9, 2017

I have Ensemble/Healthshare running in a production environment which is setup with a mirror failover and an arbiter sitting between them.

In the event of a failover we have a number of connections that need stopping/monitoring and starting in a certain order.

Is there a programmatic way we can detect the failover and stop certain services and operations immediately and then start them up again in the required order, checking their connection state before starting the next connection.

I am thinking Ens.Director is probably what I need however I need some guidance on how to implement a solution.

2
0 644
Question Han Ya · Sep 25, 2020

Whenever the Windows SNMP Service restarts, the snmpdbg log says the following. 

16:58:25 :Debug tracing enabled for SNMP agent
16:58:25 :SnmpExtensionInit called, pid=4432, tid=12276
16:58:25 :CreateEvent for CacheSNMPTrap suceeded
16:58:25 :register Cache OID 1.3.6.1.4.1.16563.1
16:58:25 :Get all Cache configs ... 16:58:25 :found 1 configs
16:58:25 :Add ENSEMBLE config to list ... 
16:58:25 :RegOpenKey for SOFTWARE\InterSystems\Cache\Configurations\ENSEMBLE\Properties

4
0 633
Question Edward Jalbert · Jan 21, 2022

We are running HealthShare on Linux Redhat via Azure.

A couple of days ago, the Azure server rebooted. Which we were unaware of. 

Resulting in the Instance being in a downed status.

In the short term I put together a quick script to check the status, if it is down to restart it.

However, before I go down that road, I thought it would be best to inquire if there is a much better and more streamlined solution?

In a nutshell I just want to check and see if the Instance is up or in a state such as  down or hung then start it.

2
0 615
Question Tirthankar Bachhar · Nov 4, 2016

As per the documentation of QueueCountAlert:
Number of messages on this item's queue needed to trigger an Alert message to be sent. Note that no further alerts will be sent unless the number of messages on the queue drops below 80% of this number and then rises again to this number.
Note that this alert will be sent even if AlertOnError is False.
Zero means no alerts of this type will be sent.
Now, the question is,
If QueueCountAlert is set to 10, and the queue size become 11 we will be getting email once.

6
0 596
Question Bharath Nunepalli · Aug 20, 2019

I'm a DBA and support Caché databases on AIX. I coded shell scripts for monitoring journaling status, databases size, license end date.

We recently got a new instance of Caché on Windows. I'm just curious to know whether anyone coded database monitoring scripts on Windows using PowerShell or any other scripting language.

If yes, please share the details.

Thanks & Regards,

Bharath Nunepalli.

3
0 594