New test I've managed to write:
a) is very close to real world example where the "puzzle" originates from,
b) is free from file i/o as its speed can differ on docker version from the Windows one,
c) do some (temporary) globals read/write, while all globals involved are entirely cashed.

Its results on IRIS 2025.1 CE/docker impressed me: in this version it runs 4-5 times faster than in native IRIS 2022.1/Windows and doesn't slow down after intermediate call of a big routine function. This is true even when both routines involved (small ztestLib and big ztestBigMac) were generated as classic (non procedure) functions collections. In the case of procedures results were good in both versions involved into my testing.

If anybody wants to reproduce this new test on native Windows or Linux IRIS 2025.1 or 2024.1 version, I would be happy to upload necessary code and data (zipped GOF file of 40MB, 250 after unpacking).

Robert, it sounds strange, but... Setting the Time Zone

You can use $ZTIMEZONE to set the time zone used by the current InterSystems IRIS process. Setting $ZTIMEZONE does not change the default InterSystems IRIS time zone or your computer’s time zone setting.

IMHO, $ztimezone setting is dangerous not for its system-wide effect (which it hasn't) but mostly due to its exclusions and anomalies, despite they are accurately listed in docs.

Hello Stefan,

Thanks for the reference, while I'm still not sure about the step #2 as ^JRNRESTO provides the  (defaulted) option to disable journaling of updates during the restore to make the operation faster, see the step #10 of Restore Globals From Journal Files Using ^JRNRESTO. Besides, this is the only option compatible with parallel dejournaling. So the idea to switch off journaling system-wide looks excessive.

Regards,
Alexey

I've succeeded with the code: 

%SYS>s P("Globals")="%DEFAULTDB"
%SYS>s P("Library")="IRISLIB"
%SYS>s P("Routines")="%DEFAULTDB"
%SYS>s P("SysGlobals")="IRISSYS"
%SYS>s P("SysRoutines")="IRISSYS"
%SYS>s P("TempGlobals")="IRISTEMP"
%SYS>Set tSC=##class(Config.Namespaces).Create("%All",.P) zw tSC
tSC=1
%SYS>w $zv
IRIS for UNIX (Ubuntu Server LTS for x86-64) 2021.1 (Build 215) Wed Jun 9 2021 09:48:30 EDT

...to brand new server with iris2020.1

It's up to you, but why not install brand new IRIS on your brand new server? As you may notice, InterSystems is actively develop IRIS and usually doesn't release minor updated versions as it was in the case of Cache (e.g. IRIS 2020.1.1 vs Cache 2018.1.5). Choosing 2020.1, you are going to install the version which is near the end of its support cycle, see Minimum Supported Version Rules.

This (getFiles) method is marked as internal in Cache, and yes, it's typical internal as it's usage is relied on the strong internals knowledge :). Besides, it's hidden in IRIS, and its caller should be rewritten to achieve DBMS independence:

 ClassMethod ListDir2(path = "", wildchar = "*", recursive As %String(VALUELIST=",y,n") = "y", ByRef dirlist)
{
 pExtension=1
 pExtension(1)=wildchar

#if $zversion["IRIS"
 temp=$name(^IRIS.Temp)
#else
 temp=$name(^CacheTemp)
#endif
 
 pTempNode=$i(@temp)
 @temp@(pTempNode)
   
 ##class(%SQL.Util.Import).getFiles(path,.pExtension,pTempNode,recursive="y")
 dirlist=@temp@(pTempNode)
 @temp@(pTempNode)  ;zw dirlist
}

Methods how these macros are defined are quite different: $$$ISWINDOWS is calculated using system call and always 1 on Windows platform, in contrast $$$WindowsCacheClient is defined manually, so it and can be easily set to 0 if needed.

Many years ago I faced a problem with LDAP which was solved this way (I didn't change ISC code, it was my own "fork"). Don't remember other details, only the fact.

Yone,

if you really moving the files every day, you don't need to check the date: there are no old files in your in-folders, because they have been deleted with mv  (move) command. Most pieces of software which does the similar tasks (e-mail clients and servers, SMS processors, etc) do it this way, moving files rather than just copying them. The simpler the better, isn't it? 

It depends.

Switch 10 which inhibits all global/routine access except by the process that sets this switch should meet, while setting it can interfere with your _own_ activity.
Switch 12 which disables logins can be insufficient for disabling web access, which is easier to restrict by stopping web server.

I didn't personally experiment with those switches as we have no such problem because our application utilizes its own "disable logins" flag by locking the variable.

Just adding 2c to Kevin's reply.

Most hosts that support TCP also support TCP Keepalive  

Besides, server application should support it. 3 hours keepalive time setting is not typical; it sounds like your server app not tuned for keepalive support or doesn't support it at all.

In case of IRIS/Caché, you should explicitly set some options on connected server socket, e.g.:

start(port) // start port listener
io="|TCP|"_port io:(:port:"APSTE"):20 e  quit
  while 1 {
io x
   u $p // connection is accepted: fork child process
child:(:5:io:io)
  }
 
child
  use $p:(:/KEEPALIVE=60:/POLLDISCON)
  ...

/KEEPALIVE=60 to set keepalive time to 60 seconds
/POLLDISCON to switch on TCP probes.

After re-reading excellent articles referenced above, it seemed that:
1) Too low QoS value can be incompatible with VM Stun time.
2) Too high value can be inappropriate as well for some other reasons. E.g., it can postpone a failover when it's of real need when Primary crashed or isolated.
So, why not stop bothering about QoS value, and just Set No Failover during snapshot phase? Documentation describes how to do it manually, while it should be possible programmatically as well.