Hello Stefan,

Thanks for the reference, while I'm still not sure about the step #2 as ^JRNRESTO provides the  (defaulted) option to disable journaling of updates during the restore to make the operation faster, see the step #10 of Restore Globals From Journal Files Using ^JRNRESTO. Besides, this is the only option compatible with parallel dejournaling. So the idea to switch off journaling system-wide looks excessive.


I've succeeded with the code: 

%SYS>s P("Globals")="%DEFAULTDB"
%SYS>s P("Library")="IRISLIB"
%SYS>s P("Routines")="%DEFAULTDB"
%SYS>s P("SysGlobals")="IRISSYS"
%SYS>s P("SysRoutines")="IRISSYS"
%SYS>s P("TempGlobals")="IRISTEMP"
%SYS>Set tSC=##class(Config.Namespaces).Create("%All",.P) zw tSC
%SYS>w $zv
IRIS for UNIX (Ubuntu Server LTS for x86-64) 2021.1 (Build 215) Wed Jun 9 2021 09:48:30 EDT

...to brand new server with iris2020.1

It's up to you, but why not install brand new IRIS on your brand new server? As you may notice, InterSystems is actively develop IRIS and usually doesn't release minor updated versions as it was in the case of Cache (e.g. IRIS 2020.1.1 vs Cache 2018.1.5). Choosing 2020.1, you are going to install the version which is near the end of its support cycle, see Minimum Supported Version Rules.

This (getFiles) method is marked as internal in Cache, and yes, it's typical internal as it's usage is relied on the strong internals knowledge :). Besides, it's hidden in IRIS, and its caller should be rewritten to achieve DBMS independence:

 ClassMethod ListDir2(path = "", wildchar = "*", recursive As %String(VALUELIST=",y,n") = "y", ByRef dirlist)

#if $zversion["IRIS"
 @temp@(pTempNode)  ;zw dirlist

Methods how these macros are defined are quite different: $$$ISWINDOWS is calculated using system call and always 1 on Windows platform, in contrast $$$WindowsCacheClient is defined manually, so it and can be easily set to 0 if needed.

Many years ago I faced a problem with LDAP which was solved this way (I didn't change ISC code, it was my own "fork"). Don't remember other details, only the fact.


if you really moving the files every day, you don't need to check the date: there are no old files in your in-folders, because they have been deleted with mv  (move) command. Most pieces of software which does the similar tasks (e-mail clients and servers, SMS processors, etc) do it this way, moving files rather than just copying them. The simpler the better, isn't it? 

It depends.

Switch 10 which inhibits all global/routine access except by the process that sets this switch should meet, while setting it can interfere with your _own_ activity.
Switch 12 which disables logins can be insufficient for disabling web access, which is easier to restrict by stopping web server.

I didn't personally experiment with those switches as we have no such problem because our application utilizes its own "disable logins" flag by locking the variable.

Just adding 2c to Kevin's reply.

Most hosts that support TCP also support TCP Keepalive  

Besides, server application should support it. 3 hours keepalive time setting is not typical; it sounds like your server app not tuned for keepalive support or doesn't support it at all.

In case of IRIS/Caché, you should explicitly set some options on connected server socket, e.g.:

start(port) // start port listener
io="|TCP|"_port io:(:port:"APSTE"):20 e  quit
  while 1 {
io x
   u $p // connection is accepted: fork child process

/KEEPALIVE=60 to set keepalive time to 60 seconds
/POLLDISCON to switch on TCP probes.

After re-reading excellent articles referenced above, it seemed that:
1) Too low QoS value can be incompatible with VM Stun time.
2) Too high value can be inappropriate as well for some other reasons. E.g., it can postpone a failover when it's of real need when Primary crashed or isolated.
So, why not stop bothering about QoS value, and just Set No Failover during snapshot phase? Documentation describes how to do it manually, while it should be possible programmatically as well.

Here is my solution. A couple of words as a preface. There are two tasks:

  • Switches journal and fixes the name of new journal file (e.g., in @..#GtrlJ@("JrnFirst")).
  • Processes the globals of a namespace. The algorithm of processing doesn't matter here, it's usually some kind of data re-coding. 

#2. This task occurred just because users' activity during the task #1 execution can introduce the changes in globals already processed by the task #1.

  • Wait for the next journal file available for processing (WaitForJrnSwitch());
  • Process the globals found in this journal using the algorithm similar to the task_#1's one.

The latter is a pseudo-code of WaitForJrnSwitch() method and GetJrnID(), its helper. 

 /// If new jrn is available, set %JrnID=Jrn ID and return 1;
/// waiting by ..#TimeWait steps till ..#TimeLimit
ClassMethod WaitForJrnSwitch() As %Boolean
 set rc=0
 set nTimes = ..#TimeLimit \ ..#TimeWait
 for i=1:1:nTimes {
  $$$TOE(sc, ..GetJrnID(.JrnID)) // current journal
  if %JrnID="" {
    set JrnNext=@..#GtrlJ@("JrnFirst")
  else {
    set JrnNext=%JrnID+1
  if JrnNext<JrnID { // avoid extra journal switching ("by restore")
    set %JrnID=JrnNext
    set rc=1
  hang ..#TimeWait
 quit rc

/// Get Jrn ID of the current journal file
/// Out:
/// returns %Status;
/// pJrnID - journal file name w/o prefix and "."
ClassMethod GetJrnID(Output pJrnID) As %Status
 set sc=1
 try {
   set file=##class(%File).GetFilename(##class(%SYS.Journal.System).GetCurrentFileName())
   set prefix=##class(%SYS.Journal.System).GetJournalFilePrefix() 
   set pJrnID=$tr($e(file,$l(prefix)+1,*),".")
 catch ex {
   set sc=$$$ERROR($s(ex.%IsA("%Exception.SystemException"):5002,'ex.Code:5002,1:ex.Code),$lg(ex.Data)_" "_ex.Location_" "_ex.Name)
 quit sc