August 27, 2020 – Alert: Possible Resource Starvation due to Orphaned Processes

InterSystems has corrected a defect that can cause a build-up of orphaned processes consuming system resources. In extreme cases, this can cause a system to become unresponsive.

This defect affects the following versions:

  • Caché and Ensemble 2018.1.4
  • InterSystems IRIS and InterSystems IRIS for Health 2019.4, 2020.1, and 2020.2
  • HealthShare Health Connect (HSAP) 15.032 built on Ensemble 2018.1.4
  • HealthShare Health Connect 2020.1

No other InterSystems product versions are affected by this issue.  Specifically, earlier versions of Caché and Ensemble, Health Connect 2019.1 and 2019.1.1 and all other HealthShare Health Connect (HSAP) versions, and all HealthShare Product versions through HealthShare 2020.1, including Unified Care Record, Information Exchange, Care Community, Clinical Viewer, Health Insight, Patient Index, Personal Community and Provider Directory, are not affected by this issue.

 

The defect results in a failure to properly shut down a %SYS.cspServer3 process when a web server connection is closed.  Over time, these orphaned processes accumulate, consuming resources (CPU, memory, etc.) and leading to an unresponsive system.  Depending on activity from the web server, the accumulation can be quite rapid.

These %SYS.cspServer3 processes are used for WebSockets and Gateway Registry methods, and they are created regardless of whether an application uses those features.

This defect affects any web servers connecting to affected versions (including the private (non-external use) Apache web server included with the product).

The correction for this defect is identified as SDK116. It will be included in all future releases of InterSystems products and is available by requesting an Ad hoc distribution from the InterSystems Worldwide Response Center

(WRC).

 

For applications that are not using WebSockets or the Gateway Registry, this problem can be resolved by preventing the Gateway from creating these connections, by following the steps listed below. If your application uses WebSockets or calls Gateway Registry methods, you cannot use this workaround.

To avoid this problem, follow these steps:

  1. Add the following line to the [System] section of the Gateway configuration file (CSP.ini):

REGISTRY_METHODS=Disabled

  1. Restart all web servers for the change to take effect.

This procedure should be done on each web server that connects to an affected instance. Existing orphaned processes will remain until the instance restarts, but no new processes will spawn after restarting the web server after this change.

If you have any questions regarding this alert, please contact the Worldwide Response Center.

  • 10
  • 0
  • 152
  • 1

Replies

Thanks for the info as I've been experiencing this when evaluating Caché 2018.1.4 and couldn't understand why the server eventually ran out of memory. Can this post be updated when there is a public build that contains the fix (I'm interested in Caché)?  It's very hard to find which bug fixes are included in each build/released version and this bug seems serious enough to get a new public release version built soon?