Question Luis Gallardo · Mar 24

System state hung on IRIS running on a Kubernetes

Hi!
We are deploying the iris image in a Kubernetes environment and the cluster state is "Hung" , looking the alerts endpoint we get 2 alerts:

[
  {
     "time":"2026-03-24T13:45:44.548Z",
     "severity":"2",
     "message":"System appears to have failed over from node a69a9f137593"
  },
  {
     "time":"2026-03-24T13:46:30.274Z",
     "severity":"2",
     "message":"Error: <PROTECT>KillAlive+1^%SYS.CDIRECT in SERVERS"
  }
]
Any idea / help where those are comming from and how to address them? I have detected that the  ^%SYSMONMGR is stopped, if I activate it, state pass to ok, and everything is fine
Log section 1
03/24/26-13:45:44:446 (2287) 0 [Utility.Event] Log Monitor Started
03/24/26-13:45:44:548 (2286) 2 [Utility.Event] System appears to have 
03/24/26-13:45:44:562 (2288) 0 [Utility.Event] Clean Daemon Started
Log secction 2
03/24/26-13:46:30:271 (2286) 0 [Utility.Event] Processing OS Package Requirements section
03/24/26-13:46:30:274 (2298) 2 [Utility.Event] Error: <PROTECT>
03/24/26-13:46:30:544 (2286) 0 [Generic.Event] Auditing to /infinity/durable/mgr/irisaudit/
Thanks!

Comments

DC AI Bot · Mar 24 Bot

The issue with the system state being “Hung” in an InterSystems IRIS environment on Kubernetes might be related to monitoring services or errors caused by resource configurations.

  1. Explanation of Error Messages:

    • Error <PROTECT>KillAlive+1^%SYS.CDIRECT in SERVERS could be indicative of resource locking or monitoring-related issues on the system.
  2. Stopped System Monitoring Manager (^%SYSMONMGR):

    • Activating ^%SYSMONMGR resolves the “Hung” state and returns the system to an operational mode. ^%SYSMONMGR monitors resources and takes actions to ensure stability.
  3. Possible Causes for Hung State:

    • Deadlocks often happen in IRIS due to overutilization of shared memory or disk being full, leading to the system being stuck. Issues like setting insufficient values for locksiz or running out of disk space for journals are common contributing factors 1.
    • Overloaded memory configuration (e.g., bbsiz, gmheap, etc.) can also cause unexpected memory usage spikes leading to system freezing 2.
  4. Mitigation and Recommendations:

    • Activate ^%SYSMONMGR to maintain resource monitoring and use proper configurations for memory allocation (gmheap, buffer sizes).
    • Regularly reviewing and adjusting system settings like shared memory related parameters can prevent such issues.
    • Utilize K8s health metrics (e.g., liveness probes) to automatically restart unresponsive pods 2.
    • For troubleshooting further, analyzing messages.log for system events and consulting support could be beneficial 1.

If further clarification or investigation is required, reaching out to the Developer Community for more advanced help would be advised 12.


Sources:

Tell us more:
Don't like the style
Not factually correct
Didn't fully follow instructions
Refused when it shouldn't have
Being lazy
More...
Submit
0 0