I built a monitoring system in Grafana using the IRIS API /api/monitor/metrics (reading with Prometheus) but I noticed that the RAM usage shown was below that shown by the operating system. I installed the Zabbix agent and the usage values were higher, but with a line with the same highs and lows but shifted.
In the Business Operation, how do we get to know which source is currently sending primary request if there are multiple requests coming at the same time?
In response to the infrastructure needs of our company's service, I've created a small API that sends SNMP queries to InterSystems to visualize relevant data for retrieval when the infrastructure implements monitoring.
However, I'm experiencing a timeout issue when attempting to collect information using an SNMP walk. Here is the code for my API's SNMP service:
I need to develop a tool to help to get what data is being consumed by a certain process, in order to get all data used to build an automated test scenario.
For example, some user process will pull data from ^GLOBAL(1)="dataString", ^GLOBAL(2)="dataString2", ^GLOBAL1(1)="data1String", ^GLOBAL2(4)="data2String4". Amidst all other data on these Globals, I will ignore everything that was not used in the user process, and get the specific keys used on it.
In the Windows Ressource Manager I can observe multiple parallel processes coming from cache.exe with read operations to journaling files.
All except one of these processes have the same reads(Byte/s). The processes point to different journal files and constantly read between 200 and 3000 Bytes/s.
The corresponding process via PID in the management portal of Caché shows the process %SYS.Monitor.Control.1. In 3 days of uptime on the server it has run 181.632.583 commands and modified 32.140.642 globals.
My employer set up a web-based HL7 interface monitor dashboard that will display all Ensemble components (Service/Process/Operation) in a Production, their status, and the support information embedded in each interfaces listing on the Monitor. Please see 3 screenshots.
This is part of the URL that we go to when accessing this Web based Monitor: ......57772/csp/healthshare/monitor/Rush.Monitor.Web.Home.cls
I'm using the EnumerateJobStatus query of class Ens.Util.Statistics to obtain the LastActivity value of a Business Host.
I would expect that this would return the timestamp of the last message received by the BH, understanding that any connect/disconnect activity would reset that timer. However, the time returned appears to actually be the time at which Ens.MonitorService generated the alert and is not directly related to anything that happened in the BH itself.
Alerts are messages generated by production components. InterSystems IRIS automatically writes the alerts to a log file and sends then to the production component named Ens.Alert. If your production does not have a component named Ens.Alert, then InterSystems IRIS writes alerts to the log file but does not send them to any component. The component named Ens.Alert can be of any class. The most frequently used classes for Ens.Alert are:
Currently we are using an older Healthshare instance but I am not opposed to using IRIS as we will upgrade eventually.
Currently for monitoring productions we have a Montior screen. We have both the Queues page and a Deepsee dashboard which has current status of our services. The issue with the Deepsee method we currently have with traffic lights is 1) the page is a bit slow to load the metrics 2) any new services from the team a new widget needs created and although this is easy enough to do it just is time consuming.
Im trying configure the Caché Monitor Manager (^MONMGR) utility for send alert e-mails. Following the steps I have doubs to configure the options in "Set Server" to send e-mails for hotmail or outlook (smtp-mail.outlook.com). I dont know how can I configure Mail server SSLConfiguration for hotmail or outlook. Could you give me help? Thank you!
after updating from 2018.2.1 to 2021.1 we observe a change in the behaviour of the Messagebank Enterprise Monitor.
In 2018.2.1, when clicking on a specific line inside the configured systems the system dashboard opened, giving insights about queue counts and error conditions.
UPDATE: It turns out it was just me being a dummy, and the snmpd was correctly telling me there is no value associated with that exact key. I should have used snmpwalk instead of snmpget to display the whole tree.
Original Post follows:
Hello! I'm trying to set up SNMP monitoring on Caché, using documentation and this article
I believe most of you have encounted this problem: a healthconnect/ensemble user get a slow response and ask measurement on how long it takes ensmeble to process this request, the ensemble 'activity data' gives no clue of the delay.
The reason is HealthConnect message measurement was based on ensemble message, which can’t give a correct answer on when ensmeble recevie the request and what time it send back response. when there is delay on inbound/outbound adpter, or csp gateway, there is no way to find out the delay from "activity data" .
The current version of SAM creates Prometheus metric endpoints which appear to be handled correctly by the current prometheus scraper, however the metrics do not confirm to the current prometheus standard. The standard states:
I installed a community version of Intersystems IRIS in a Large AWS EC2 instance to do some testing. I installed SAM and when I try to "Add a new cluster" I receive the following: "ERROR #5005: Cannot open file '/config/prometheus/isc_tmp_yml_file.yml'"
I created a task from Management portal Task manager to use the Ens.Util.Tasks.Purge task . Task set up includes email notification setup for Completion email and error email.
This task is giving an error and no email is generated:
I just watched the recording of Michael Brady's presentation on Ensemble Disk Free Space Monitoring. Is the sample code for the Task definition class still available? How can I obtain a copy?
A long time ago I enabled Activity Monitoring to be able to save myself headaches in the future when looking at the performance of various message routes through our productions. It's served it's purpose of answering questions on how many messages we process a week etc but I had not had the chance to really dig down into the stats for specific message types or destinations to pin point issues.
Off the back of the Interface Monitoring post I had created a class that queries the Ens.AlertRequest global and returns the entries between 6pm the night before and 6am in the morning.
I tested this build in our T&D environments and the build worked very well.
However in our production environment the query is being truncated, by what I believe to be a timeout and I get a partial query output.
In the System>SQL pages my 12 hour query times out.
We are constantly running into issues where there are billions of Orphaned messages in our system that cause problems, and we have to manually run a cleanup to fix performance issues.
I recently discovered the Monitoring Activity Volume feature in IRIS and I was amazed by it. So, I put it to work in one of our productions. It is nice how easy it is to set up and all the possibilites that came with it.
But there's something weird: the numbers. Actually, one of the BP is stating a time of more than 6 seconds to process: