System state hung on IRIS running on a Kubernetes
Hi!
We are deploying the iris image in a Kubernetes environment and the cluster state is "Hung" , looking the alerts endpoint we get 2 alerts:
[
{
"time":"2026-03-24T13:45:44.548Z",
"severity":"2",
"message":"System appears to have failed over from node a69a9f137593"
},
{
"time":"2026-03-24T13:46:30.274Z",
"severity":"2",
"message":"Error: <PROTECT>KillAlive+1^%SYS.CDIRECT in SERVERS"
}
]
Any idea / help where those are comming from and how to address them? I have detected that the ^%SYSMONMGR
Log section 1
03/24/26-13:45:44:446 (2287) 0 [Utility.Event] Log Monitor Started
03/24/26-13:45:44:548 (2286) 2 [Utility.Event] System appears to have
03/24/26-13:45:44:562 (2288) 0 [Utility.Event] Clean Daemon Started
Log secction 2
03/24/26-13:46:30:271 (2286) 0 [Utility.Event] Processing OS Package Requirements section
03/24/26-13:46:30:274 (2298) 2 [Utility.Event] Error: <PROTECT>
03/24/26-13:46:30:544 (2286) 0 [Generic.Event] Auditing to /infinity/durable/mgr/irisaudit/
Thanks!
Comments
The issue with the system state being “Hung” in an InterSystems IRIS environment on Kubernetes might be related to monitoring services or errors caused by resource configurations.
-
Explanation of Error Messages:
- Error
<PROTECT>KillAlive+1^%SYS.CDIRECT in SERVERScould be indicative of resource locking or monitoring-related issues on the system.
- Error
-
Stopped System Monitoring Manager (^%SYSMONMGR):
- Activating ^%SYSMONMGR resolves the “Hung” state and returns the system to an operational mode. ^%SYSMONMGR monitors resources and takes actions to ensure stability.
-
Possible Causes for Hung State:
- Deadlocks often happen in IRIS due to overutilization of shared memory or disk being full, leading to the system being stuck. Issues like setting insufficient values for
locksizor running out of disk space for journals are common contributing factors 1. - Overloaded memory configuration (e.g.,
bbsiz,gmheap, etc.) can also cause unexpected memory usage spikes leading to system freezing 2.
- Deadlocks often happen in IRIS due to overutilization of shared memory or disk being full, leading to the system being stuck. Issues like setting insufficient values for
-
Mitigation and Recommendations:
- Activate ^%SYSMONMGR to maintain resource monitoring and use proper configurations for memory allocation (
gmheap, buffer sizes). - Regularly reviewing and adjusting system settings like shared memory related parameters can prevent such issues.
- Utilize K8s health metrics (e.g., liveness probes) to automatically restart unresponsive pods 2.
- For troubleshooting further, analyzing
messages.logfor system events and consulting support could be beneficial 1.
- Activate ^%SYSMONMGR to maintain resource monitoring and use proper configurations for memory allocation (
If further clarification or investigation is required, reaching out to the Developer Community for more advanced help would be advised 12.
Sources: