Question
· Apr 12, 2023

Quick Process to Start/Stop an Object

We are currently using different iterations of Ens.Director.EnableConfig items to start/stop objects within the Interoperability Namespace. We are looking for ways to minimize our downtime as we move from AIX to a new section of our Network and Red Hat Servers.

Besides using Ens.Director.EnableConfig item and waiting for a response, or just disabling the objects through the Namespace class file, is there a quicker way to stop Services and Operations to ensure the TCP disconnect is sent to those endpoints so we can move the networking rules to ensure they point to new servers? Kill is out of the question because it will not send the disconnect we are looking for.

Thanks

Product version: IRIS 2022.1
$ZV: IRIS for UNIX (Red Hat Enterprise Linux 8 for x86-64) 2022.1 (Build 209U) Tue May 31 2022 12:13:24 EDT
Discussion (12)2
Log in or sign up to continue

I need to be able to loop through all the Services/Operations to shut them down to ensure that the TCP Disconnect is sent. 

We had a consultant create us a script that uses Regex to loop through and shutdown those Services/Operations, but running the script to bring everything down takes a good 10 min or so to disable all the Service Operation Objects.

In testing the cut over when we did our Test Environment in total it took us 20 min to bring down Ensemble (2018.1.3) and bring up IRIS (2022.1) with the network changes.

But we forgot this step and had to ask different systems to restart their Interfaces because they were still hung on the previous connections and don't have framework to realize the connection was no longer connected.

Is there a way to be able to cutdown the response time of EnableConfig item?

I recommend you to check this article, but here's a summary:

1. Calculate a list of BHs which need a restart (not sure why you need regexp, all BHs are in Ens_Config.Item table):

SELECT %DLIST(Name) bhList
FROM Ens_Config.Item 
WHERE 1=1
  AND Enabled = 1
  AND Production = :production
  AND ClassName %INLIST :classList -- or some other condition

2. Restart them all at once instead of one by one:

for stop = 1, 0 {
  for i=1:1:$ll(bhList) {
    set host = $lg(bhList, i)
    set sc = ##class(Ens.Director).TempStopConfigItem(host, stop, 0)
  }
  set sc = ##class(Ens.Director).UpdateProduction()
}

If the 3rd argument to EnableConfigItem() is 1, the method will update the production on each call. That can be time consuming, so it might be worth considering setting that to 0 and then call Ens.Director.UpdateProduction() after the loop completes.

The other issue is that simply disabling a Production Config Item will only shut it down at the next polling interval or completion of the currently-processing request. This is a generally a good thing, but can take time for some interfaces.

For @Eduard Lebedyuk's benefit ... the regex @Scott Roth referred to is most likely to allow the selective shutdown of interfaces by name pattern to accommodate outages/upgrades for external systems. Alternately to be able to disable inbound interfaces before outbound interfaces to prevent queued messages.

Using the suggestions below, looping through the Config Items doesn't appear to be the issue. I ran into an issue with the UpgradeProduction. In total it took 18 min to shutdown all the Services in just one namespace when I looped through all the Config Items, disabled them, and then Updated the Production with no timeout or force to shut them down. Is there a way to make UpgradeProduction run faster beside setting the timeout or force?

By Adding a timeout of 2, and setting the force flag I was able to get it down from 18 min to roughly 11 min. If I remove the timeout and force flag, and watched the output I could possibly find those problematic children. I wonder if using Ens.Job would be any easier as the objects have been disabled its a matter of getting the Jobs to stop and send the unbind. 

Has anyone used Ens.Jobs before to quickly bring down a Namespace in a controlled environment? We need to unsure the unbinds are sent so we can move to a new IP address on a different section of the network.

I don't think so.

UpdateProduction (I think that's what you meant) is attempting to obtain state information for all of the business hosts and likely won't complete until they're all down. Calling it at the end should still be faster than having it enabled for each EnableConfigItem() call.

The reality is that you appear to have a lot of processes that are dependent on polling rates and/or getting the appropriate responses back from external systems on notification they're terminating connections.

If you need to shut down the interfaces fast, you really can only do it at the expense of graceful connection termination.

Have you considered creating separate namespaces and compartmentalizing interfaces to keep your productions at a more manageable size? Business hosts in multiple smaller productions benefit from parallelism when performing administrative tasks like stopping/starting interfaces in bulk.

Recently I wrote a snippet to determine which Business Host took to long to stop:

Class Test.ProdStop
{

/// do ##class(Test.ProdStop).Try()
ClassMethod Try()
{
	set production = ##class(Ens.Director).GetActiveProductionName()
	set rs = ..EnabledFunc(production)
	if rs.%SQLCODE && (rs.%SQLCODE '= 100) {
		write $$$FormatText("Can't get enabled items in %1, SQLCode: %2, Message: %3", production, rs.%SQLCODE, rs.%Message)
		quit
	} 
	
	while rs.%Next() {
		set bh = rs.Name
		set start = $zh
		set sc = ##class(Ens.Director).EnableConfigItem(bh, $$$NO, $$$YES)
		set end = $zh
		set duration = $fn(end-start,"",1)
		write !, $$$FormatText("BH: %1, Stopped in: %2, sc: %3", bh,  duration, $case($$$ISOK(sc), $$$YES:1, :$system.Status.GetErrorText(sc))), !
		if duration>60 {
			write !, $$$FormatText("!!!!!!! BH: %1 TOOK TOO lONG !!!!!!!", bh),!
		}
	}
}

Query Enabled(production) As %SQLQuery
{
SELECT 
	Name 
	, PoolSize
FROM Ens_Config.Item 
WHERE 1=1
	AND Production = :production
	AND Enabled = 1
}

}

It stops BHs one by one, measuring how long it took to stop each one.

I would recommend you try to determine which items are taking too long to stop.

Export production before running this code to avoid manually reenabling all the hosts.