Caché process failures on RHEL V7.2

Caché, Release

Caché process failures on RHEL V7.2

InterSystems WRC has investigated several issues of process failure that can be attributed to a recent change in Red Hat Linux.

A new feature implemented in RHEL V7.2 (systemd-219-19.el7.x86_64) can cause O.S. IPC (Inter-process communication) semaphores to be deallocated when a non-system RHEL user logs out (system users, i.e. with a UID number < 1000, are excluded)

Internally, Caché makes use of IPC semaphores to control the operation of Caché processes (for example when trying to wake-up a Caché process). It does this by using the “semop” system service and if the operating system unexpectedly removes semaphores that Caché relies upon for IPC, then processes can fail. If this occurs the following evidence will be found in the cconsole.log:

“System error while trying to wake-up a process, code = 22”

along with corresponding errors being placed into the Caché SYSLOG, such as the following  typical example:

Err   Process    Date/Time           Mod Line  Routine            Namespace

22    39761      09/29/2016 04:41:27PM

                                     61  359   BF0+1359^Ens.Queue.1 HSBUS

This can eventually lead to a Caché instance hang occurring.

A link to an article supplied by Redhat is included below which gives more details about this feature and how it can be disabled:


https://access.redhat.com/solutions/2062273
 

This issue has been fixed in systemd-219-19.el7_2.4 (released with RHBA-2016-0199(https://rhn.redhat.com/errata/RHBA-2016-0199.html))

  • + 3
  • 0
  • 425
  • 1

Comments

You will be happy even for /etc/systemd/logind.conf RemoveIPC=yes if cache installation uses owner=cachesys, group allowed start/stop cache - cachemgr. You might use either names you like instead of cachesys, cachemgr but 'cachesys' != cacheusr (in term of uid) It works even if cachesys uid > 1000 i.e. it is considered as non system process.