Off the back of the Interface Monitoring post I had created a class that queries the Ens.AlertRequest global and returns the entries between 6pm the night before and 6am in the morning.
I tested this build in our T&D environments and the build worked very well.
However in our production environment the query is being truncated, by what I believe to be a timeout and I get a partial query output.
In the System>SQL pages my 12 hour query times out.
A long time ago I enabled Activity Monitoring to be able to save myself headaches in the future when looking at the performance of various message routes through our productions. It's served it's purpose of answering questions on how many messages we process a week etc but I had not had the chance to really dig down into the stats for specific message types or destinations to pin point issues.
As per the documentation of QueueCountAlert: Number of messages on this item's queue needed to trigger an Alert message to be sent. Note that no further alerts will be sent unless the number of messages on the queue drops below 80% of this number and then rises again to this number. Note that this alert will be sent even if AlertOnError is False. Zero means no alerts of this type will be sent.
Now, the question is, If QueueCountAlert is set to 10, and the queue size become 11 we will be getting email once.
Alerts are messages generated by production components. InterSystems IRIS automatically writes the alerts to a log file and sends then to the production component named Ens.Alert. If your production does not have a component named Ens.Alert, then InterSystems IRIS writes alerts to the log file but does not send them to any component. The component named Ens.Alert can be of any class. The most frequently used classes for Ens.Alert are:
I just watched the recording of Michael Brady's presentation on Ensemble Disk Free Space Monitoring. Is the sample code for the Task definition class still available? How can I obtain a copy?
I'm using the EnumerateJobStatus query of class Ens.Util.Statistics to obtain the LastActivity value of a Business Host.
I would expect that this would return the timestamp of the last message received by the BH, understanding that any connect/disconnect activity would reset that timer. However, the time returned appears to actually be the time at which Ens.MonitorService generated the alert and is not directly related to anything that happened in the BH itself.
Internally we use splunk for monitoring applications and network.
Does Ensemble have a way of exposing internal metrics and/or a way of exposing custom built metrics?
I've used Deepsee dashboards in the past to monitor Apache Tomcat/Apache Camel/hawtio using JMX rest calls. This is the other way around and ideally I'd like to expose metrics on:
My employer set up a web-based HL7 interface monitor dashboard that will display all Ensemble components (Service/Process/Operation) in a Production, their status, and the support information embedded in each interfaces listing on the Monitor. Please see 3 screenshots.
This is part of the URL that we go to when accessing this Web based Monitor: ......57772/csp/healthshare/monitor/Rush.Monitor.Web.Home.cls
My group needs to be able to monitor items / tasks, and let a non-management-portal user see the monitoring. Is it possible to run DeepSee queries on Production items? I feel like I should not be recreating the production environment or the task manager just so that I can query on the items that are running, and on their states (like "successful" or "send email").
Also, I need to log custom events for each task, and I'm running into difficulties with the task manager in this regard; hence the question about using the Production instead, but querying it.
Has anyone done any kind of integration with Dynatrace, which is a JVM transaction monitoring tool? Our organization uses this extensively with our Java and .Net applications and we wanted to know if it is even possible.
In the Windows Ressource Manager I can observe multiple parallel processes coming from cache.exe with read operations to journaling files.
All except one of these processes have the same reads(Byte/s). The processes point to different journal files and constantly read between 200 and 3000 Bytes/s.
The corresponding process via PID in the management portal of Caché shows the process %SYS.Monitor.Control.1. In 3 days of uptime on the server it has run 181.632.583 commands and modified 32.140.642 globals.
Can someone direct me to where in the documentation we can find how consumption may be calculated for global storage?
Caché Version
2010.1
Operating System
HP OpenVMS 8.4
EDIT: After receiving some responses, it seems I was unclear in my initial inquiry. I am looking to determine our rate of consumption of storage; however, I am having some difficulty in doing that.
We want to monitor an Ensemble Production and send custom email alerts in function of some Rules. For example, if we normally receive 1 message per second, if suddenly we receive 5 or more messages per second, we want to send an email alert. And if tomorrow we don't want to check this again, we want to disable it through Ensemble Business Rules.
I installed a community version of Intersystems IRIS in a Large AWS EC2 instance to do some testing. I installed SAM and when I try to "Add a new cluster" I receive the following: "ERROR #5005: Cannot open file '/config/prometheus/isc_tmp_yml_file.yml'"
I know there's a whole chapter on the subject but I would love a super simple video demo or sample configuration or training course. The myriad menu of options and unfamiliar prompts can make it a bit daunting. The challenge is simple. Send an email notification if the license usage exceeds n% LU consumption. Why? A recent software change seemed to be responsible for causing the LU total consumption to reach 100%.
In looking at the Production monitor within Ensemble, I was wondering if there is a way we could customize it for our use. I notice it is basically a dashboard.
For example I would only like to truly display those Services, Processes, and Operations that are truly in dire need of attention. The Monitor out of the box just seems too busy, and I would like to simplify it.
The current version of SAM creates Prometheus metric endpoints which appear to be handled correctly by the current prometheus scraper, however the metrics do not confirm to the current prometheus standard. The standard states:
In response to the infrastructure needs of our company's service, I've created a small API that sends SNMP queries to InterSystems to visualize relevant data for retrieval when the infrastructure implements monitoring.
However, I'm experiencing a timeout issue when attempting to collect information using an SNMP walk. Here is the code for my API's SNMP service:
Whenever the Windows SNMP Service restarts, the snmpdbg log says the following.
13:08:59 :Attempting initial TCP connection(s) with 1 Cache instances ... 13:08:59 :Get connection with ENSEMBLE on port 1972 13:08:59 :Connection refused on port 1972, check if Cache instance ENSEMBLE is started. 13:08:59 :Cache iscsnmp.dll initialized for 1 configs
Ensemble and all productions are running. I've set up Caché SNMP agent on many other servers in our company and those are working fine. However this one server won't budge.
Can someone tell me if intersystems-ru/deepsee-sysmon-dashboards is developed for a specific version of Ensemble? Looks like it could be useful to my group but we aren't upgrading till later this year and we are on 2015.2.2.