Replies by Alexey Maslov for InterSystems Developer Community

Alexey Maslov · Sep 16, 2019

Here is my solution. A couple of words as a preface. There are two tasks:
#1

Switches journal and fixes the name of new journal file (e.g., in @..#GtrlJ@("JrnFirst")).
Processes the globals of a namespace. The algorithm of processing doesn't matter here, it's usually some kind of data re-coding.

#2. This task occurred just because users' activity during the task #1 execution can introduce the changes in globals already processed by the task #1.

Wait for the next journal file available for processing (WaitForJrnSwitch());
Process the globals found in this journal using the algorithm similar to the task_#1's one.

The latter is a pseudo-code of WaitForJrnSwitch() method and GetJrnID(), its helper.

 /// If new jrn is available, set %JrnID=Jrn ID and return 1;
/// waiting by ..#TimeWait steps till ..#TimeLimit
ClassMethod WaitForJrnSwitch() As %Boolean
{
 set rc=0
 set nTimes = ..#TimeLimit \ ..#TimeWait
 for i=1:1:nTimes {
  $$$TOE(sc, ..GetJrnID(.JrnID)) // current journal
  if %JrnID="" {
    set JrnNext=@..#GtrlJ@("JrnFirst")
  } else {
    set JrnNext=%JrnID+1
  }
  if JrnNext<JrnID { // avoid extra journal switching ("by restore")
    set %JrnID=JrnNext
    set rc=1
    quit
  }  
  hang ..#TimeWait
 }
 quit rc
}


/// Get Jrn ID of the current journal file
/// Out:
/// returns %Status;
/// pJrnID - journal file name w/o prefix and "."
ClassMethod GetJrnID(Output pJrnID) As %Status
{
 set sc=1
 try {
   set file=##class(%File).GetFilename(##class(%SYS.Journal.System).GetCurrentFileName())
   set prefix=##class(%SYS.Journal.System).GetJournalFilePrefix() 
   set pJrnID=$tr($e(file,$l(prefix)+1,*),".")
 } catch ex {
   set sc=$$$ERROR($s(ex.%IsA("%Exception.SystemException"):5002,'ex.Code:5002,1:ex.Code),$lg(ex.Data)_" "_ex.Location_" "_ex.Name)
 }
 quit sc
}

go to post

Alexey Maslov · Aug 16, 2019

David,

you can try

set rc=$zf(-1,"export MYVAR="_someValue)

while it would hardly work as Caché process environment will be destroyed after the process halt. Honestly, I've never tried it.

I usually use signal files to pass values back from Caché, as it was shown in the 1st sample here: How to return the status code of Cache process to OS shell script?

See Timothy Leavitt's answer to the same post for another option which is more convenient when you are to return numeric values.

go to post

Alexey Maslov · Jul 31, 2019

Hi James,

Agree with you, parsing terminal output is not the smartest solution. I always try to avoid it using intermediate files. E.g. (from real life):

rm -f $mydir/db_temp
NspToCheck=$nspace

csession $instance -U%SYS << EOF > /dev/null
set fn="$mydir/db_temp"
o fn:("WNS"):1 u fn
try {
  zn "$NspToCheck"
  w \$s(\$zu(5)=\$zcvt("$NspToCheck","U"):\$zutil(12,""),1:"$NspToCheck")
} catch {
  w \$p(\$ze,">")_">"
}
c fn
h
EOF

DbQms=$(cat "$mydir/db_temp")

Here the default DB of the namespace $NspToCheck (or $ZError code) is written to $mydir/db_temp file, than it goes to $DbQms shell variable and processed as needed.

Initial answer was amended.

go to post

Alexey Maslov · Jul 30, 2019

Your sample works for me:

USER>s rc = $zf(-100,"/SHELL","pwd")
/cachesys/mgr/user

USER>w $zv
Cache for UNIX (Ubuntu Server LTS for x86-64) 2017.2.2 (Build 865U) Mon Jun 25 2018 10:48:26 EDT

What Caché version are you using? $zf(-100) was added in 2017.2.1 (Build 801_3). For older versions use $zf(-1 or -2), while upgrade to latest Caché (or even IRIS?) release would probably be the better choice.

go to post

Alexey Maslov · Jul 19, 2019

I use the similar code for the same purpose and it works. Here is my (debug) version has been written awhile ago:

  ; In:
  ; pFirst - 1st journal file,
  ; pLast - last journal file,
  ; pSDB - source DB directory where journals were generated,
  ; pTDB - target DB directory where journals should be applied
  ;
  ; Sample:
  ; zn "test" set target=$zu(12,""), source=target ; start in target namespace assuming source == target
  ; k ^mtempfilt ; if you injected your ZJRNFILT
  ; d jrnrest^ztestFF("20190917.001","20190919.008",source,target)
  ; zwrite ^mtempfilt
  ; 
 
jrnrest(pFirst,pLast,pSDB,pTDB)

 new (pFirst,pLast,pSDB,pTDB) ;de!!!
 new $namespace s $namespace="%SYS"
 s pLast=$g(pLast,pFirst)
 s pSDB=$g(pSDB,$zu(12)_"user\")
 s pTDB=$g(pTDB,$zu(12)_"test\")
 s RestOref=##class(Journal.Restore).%New()
 s RestOref.FirstFile=pFirst
 s RestOref.LastFile=pLast
 s RestOref.RollBack=0 ;? default = 1
 w !,pFirst_" to "_pLast_"; "_pSDB_" => "_pTDB_" OK?" r ans#1 w ! quit:"nN"[ans
 s sc=RestOref.RedirectDatabase(pSDB,pTDB) if 'sc g jrnbad
 s sc=RestOref.SelectUpdates(pSDB) if 'sc g jrnbad ; all globals; need it to fire RedirectDatabase
 s RestOref.Filter="^ZJRNFILT1" ; means nothing as ^ZJRNFILT is always used
 s t0=$zh
 s CHATTY=0 ; has no effect
 s sc=RestOref.Run() if 'sc g jrnbad
 s t1=$zh
 w "sc="_sc_" dt="_$fn(t1-t0,"",3),!
jrnbad
 if $g(sc)'="",'sc d $system.Status.DisplayError(sc)
 quit

go to post

Alexey Maslov · Jul 19, 2019

Why do you think that ZJRNFILT doesn't work, maybe your journal files just don't contain any `set ^ABC(I)=I` command? After all, our guesses are easy to check. Just inject a couple of statements in your code:

ZJRNFILT(jid,dir,glo,type,restmode,addr,time) /*Filter*/
  Set restmode=1                              /*Return 1 for restore*/
  If glo["ABC",type="S",$i(^mtempfilt("S"))
  If glo["ABC",type="K",$i(^mtempfilt("K")) Set restmode=0 /*except if a kill on ^ABC*/
  Quit

and check ^mtempfilt value after the journal restoration:

  zw ^mtempfilt

go to post

Alexey Maslov · May 6, 2019

As to sharding, we would hardly use it as we don't use persistent objects at all. As to other IRIS advanced system dependent features, we can, but we don't want to port them to Cache. E.g. we can write some code to improve the life in clouds, but we don't need it in Cache because of lack of plans of new Cache deployments in clouds.

go to post

Alexey Maslov · May 6, 2019

Our nearest plan is to continue development in Cache having some kind of code convertor to IRIS. It would be nice if ISC would provide such utility by itself, at least as a starting point for further customisation.

Taking this approach we would not loose any functionality. We confess that we are restricting ourselves to the features available in both platforms this way, but it does not seem to be a huge price for the benefits of single code base.

go to post

Alexey Maslov · Apr 18, 2019

You'd hardly find plugin for Caché for any 3d party tool, while SNMP support included in all InterSystems products allows easy integration with most of them (Zabbix, Nagios, etc). It's even possible to write custom MIB to provide additional metrics, we tried it and it really worked.

Nowadays more lightweight approaches are getting popular, such as Grafana based ones mentioned by Evgeny. I'd personally recommend to use enterprise scale tool such as Zabbix if and only if you really need a solution to monitor all parts of your data center infrastructure (servers, OS, LAN, etc) besides DBMS; otherwise it would look like overkill, I'd prefer Grafana based one.

go to post

Alexey Maslov · Mar 16, 2019

It can be a sign of too long Write Daemon buffer queues on your DR Backup member. The reason for it can be insufficient throughput of its disk i/o subsystem and/or problems with disk i/o. I should start with looking through cconsole.log for messages indicating:

abnormal latency of disk access
Write Daemon non-responsive "more than 5 minutes" states
Write Daemon panic
and so on.

Next step would be start pButtons monitoring around the clock to indicate the guessed issue(s) more selectively. Looking at OS performance logs can be helpful as well.

go to post

Alexey Maslov · Feb 13, 2019

Continuing Robert's list:

5) CACHETEMP is always local on APP servers (= ECP clients). As to our experience, it's very important feature, because it allows to keep processing of temporary data locally, without extra network and ECP server disk i/o workload. One of surprises of ECP I've got: while it's relatively easy to achieve high speed of intra-ECP networking (as far as ~10Gbit/s hardware is available), ECP server disk i/o subsystem can easily become a bottleneck unless you accurately spread data processing among ECP clients.

6) Sergio, you wrote:

I mean, all the data would be stored on disk and will have to be synchronized through the net with the other APP servers.

If you really need to synchronize even temporary data, then simple horizontal scaling with ECP without some optimization of data processing (see p.5) can be less cost-effective than comparable vertical scaling solution.

go to post

Alexey Maslov · Feb 6, 2019

This is well-known classic approach how asynchronous behavior can be simulated in "synchronous only" language. It potentially suffers of two problems caused by the need to check if something has dropped to TCP connection:

1) Read x:timeOut (where timeOut>0) causes delays up to timeOut seconds which are more or less acceptable for background job but not acceptable for foreground (e.g., some UI handling).

2) Read x:0 is too unreliable.

Agree with Dmitry: it helps that "not all those tasks have to be asynchronous".

go to post

Alexey Maslov · Feb 6, 2019

In addition to replies of Evgeny and Robert:

source code globals (^oddDEF, ^rMAC, ^ROUTINE, ^rOBJ, etc) are usually negligibly small if compared to data globals;
not keeping .INT does not harden privacy, as all the sources are still available as .CLS, .INC, .MAC.

go to post

Alexey Maslov · Dec 5, 2018

A customer of ours is running a production system of two mirrored DB servers and 13 APP ones, running 24x7 with possible downtime <= 2 hours once a month.

Summary production DB size is ~ 4TB. Full backup takes ~ 10 hours, its restore ~ 16 hours. Therefore incremental backups are scheduled nightly, and full backup - weekly; to avoid performance drawback, Mirror Backup server is assigned to run all this activity.

As to integrity check, it turned to be useless since the DB size had overgrown 2TB: weekend had become not enough to run it through, so it was disabled. It doesn't seem to be a great problem because:

Enterprise grade SANs are very reliable, the probability of errors is very low.
Let's imagine that integrity error occurred. Which database blocks would be most probably involved? Those that are active changed by users' processes, rather than those that reside in "stale" part of the database. So, the errors would most likely appear on application level sooner than they might be caught with Integrity Check. What, do you think, would the administrator do in case of such kind of errors? Stop production for several days, running Integrity Check and fixing the errors? No, they would just switch from damaged Primary to (healthy) Backup Server. And what if Backup Server is damaged as well? As they reside on separated disk groups, this scenario means real disaster with the best solution: to switch to Async Backup which resides at another data center.

go to post

Alexey Maslov · Nov 20, 2018

Remove ZNamespace "%SYS"from the ExecuteCode line. You may want to do it for either of two reasons:

if you scheduled a new task using Task manager's defaults for startup namespace as "%SYS", it is not needed
this construction in syntactically wrong; zn "%SYS" or new $namespace set $namespace="%SYS" should be used instead.

go to post

Alexey Maslov · Nov 1, 2018

Thanks, Larry,
Adding your variant to the "code base".
As most of others, your solution needs * length correction * which I'm putting aside from Xecute to improve readability.

lpad(number=1,length=4)
  new code,i,z,sign
  set number=+number
  set sign="" set:number<0 sign="-",number=-number
  set code($increment(code))="w sign_$tr($j(number,length),"" "",""0"")"
  set:length<$l(number) length=$l( number) ;* length correction *
  set code($increment(code))="w sign_$e(1E"_length_"+number,2,*)"
  set code($increment(code))="w sign_$e(10**length+number,2,*)"
  set code($increment(code))="w sign_$e($tr($j("""",length),"" "",0)_number,*-length+1,*)"
  set code($increment(code))="s $P(z,""0"",length)=number w sign_$E(z,*-(length-1),*)"
  for i=1:1:code write code(i),"  => ",?60 xecute code(i) write !
  quit

go to post

Alexey Maslov · Nov 1, 2018

If we are looking for more generic solution, e.g. pad <number> by zeroes yielding to the field of given <length>, let's try:

lpad(number=1,length=4)
  new code
  set code($increment(code))="w $tr($j(number,length),"" "",""0"")"
  set code($increment(code))="w $e(1E"_length_"+number,2,*)"
  set code($increment(code))="w $e(10**length+number,2,*)"
  set code($increment(code))="w $e($tr($j("""",length),"" "",0)_number,*-length+1,*)"
  for i=1:1:code write code(i)," => " xecute code(i) write !
  quit

Some results are:

for n=999,9999,99999 d lpad^ztest(n,4) w !
 
w $tr($j(number,length)," ","0") => 0999
w $e(1E4+number,2,*) => 0999
w $e(10**length+number,2,*) => 0999
w $e($tr($j("",length)," ",0)_number,*-length+1,*) => 0999
 
w $tr($j(number,length)," ","0") => 9999
w $e(1E4+number,2,*) => 9999
w $e(10**length+number,2,*) => 9999
w $e($tr($j("",length)," ",0)_number,*-length+1,*) => 9999
 
w $tr($j(number,length)," ","0") => 99999
w $e(1E4+number,2,*) => 09999
w $e(10**length+number,2,*) => 09999
w $e($tr($j("",length)," ",0)_number,*-length+1,*) => 9999

Only the first solution (John's one) provides valid result even with "bad" input data ($length(99999) > 4). Others can't be amended this way without extra efforts irrelevant to this tiny task. Just to complete it:

lpad(number=1,length=4)
 new code,i
 set code($i(code))="w $tr($j(number,length),"" "",""0"")"
 set code($i(code))="w $e(1E"_$s(length>$l(number):length,1:$l(number))_"+number,2,*)"
 set code($i(code))="w $e(10**$s(length>$l(number):length,1:$l(number))+number,2,*)"
 set code($i(code))="w $e($tr($j("""",$s(length>$l(number):length,1:$l(number))),"" "",0)_number,*-$s(length>$l(number):length,1:$l(number))+1,*)"
 for i=1:1:code write code(i)," => ",?60 xecute code(i) write !
 quit

Now all solutions become correct even with the <number> >= 99999, but only the first one keeps its simplicity.

go to post

Alexey Maslov · Oct 22, 2018

Hi Pete,
we have implemented the model that's very similar to yours with slight differences:

we use it at home only, where we have several development and testing Caché instances;
most of them are not connected using Mirroring and/or ECP;
all Caché users are LDAP users, so we need not bother of creating/modifying users on per instance basis;
one instance is used as a repository of roles definition (so called Roles Repository); it "knows" about each role that should be defined on each instance; this repository is wrapped with a REST service;
each role has a "standard" name, e.g. roleDEV, roleUSER, roleADM; these names are used in LDAP users' definition and retrieved during LDAP authentication process;
the resources lists for each standard role are stored in the Roles Repository; those lists are associated with instance addresses (server+port), so each role can be differently defined for different Caché instances; therefore, a user with the role roleDEV can have different privileges on different servers;
each Caché instance queries the Roles Repository at startup; after getting the current definitions of its roles, it applies it to its Caché Security database.

This solution is used in our company's Dev & Testing environment more than a year without great problems. It is rather flexible and doesn't depend on proprietary transport protocols. The only drawbacks found are:

the Roles Repository is automatically queried at Caché startup only, so if something in role(s) definition(s) should be changed on fly, manual querying is needed;
sometimes InterSystems introduces new security caveats with the new versions of its products; one of them was a subject of an article here: https://community.intersystems.com/post/implicit-privileges-developer-ro....

go to post

Alexey Maslov · Oct 1, 2018

Dear colleagues,

Thank you for paying so much attention to this tiny question. Maybe it was too tiny formulated: I should mention that objects instantiation impact is beyond the scope of the question as all of them are instantiated once; correspondent OREFs are stored in global scope variables for "public" use.

Going deep inside with %SYS.MONLBL is possible, while I'm too lazy to do it having no real performance problem. So, I've wrote several dummy methods, doubling instance and class ones, with different numbers of formal arguments, from 0 to 10. Here is the code I managed to write.

Class Scratch.test Extends %Library.RegisteredObject [ ProcedureBlock ]
ClassMethod dummyClassNull() As %String
{
  q 1
}

Method dummyInstNull() As %String
{
  q 1
}

ClassMethod dummyClass5(a1, a2, a3, a4, a5) As %String
{
  q 1
}

Method dummyInst5(a1, a2, a3, a4, a5) As %String
{
  q 1
}

ClassMethod dummyClass10(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10) As %String
{
  q 1
}

Method dummyInst10(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10) As %String
{
  q 1
}

}

My testing routine was:

ClassVsInst
   s p1="пропоывшыщзшвыщшв"
   s p2="гшщыгвыовлдыовдьыовдлоыдлв"
   s p3="widuiowudoiwudoiwudoiwud"
   s p4="прпроыпворыпворыпворыпв"
   s p5="uywyiusywisywzxbabzjhagjЭ"
   s p6="пропоывшыщзшвыщшв"
   s p7="гшщыгвыовлдыовдьыовдлоыдлв"
   s p8="widuiowudoiwudoiwudoiwud"
   s p9="прпроыпворыпворыпворыпв"
   s p10="uywyiusywisywzxbabzjhagjЭ"
   d run^zmawr("s sc=##class(Scratch.test).dummyClass10(p1,p2,p3,p4,p5,p6,p7,p8,p9,p10)",1000000,"dummyClass10 "_$p($zv,"(Build"))
   n st s st=##class(Scratch.test).%New() d run^zmawr("s sc=st.dummyInst10(p1,p2,p3,p4,p5,p6,p7,p8,p9,p10)",1000000,"dummyInst10 "_$p($zv,"(Build"))
   s st=""
   d run^zmawr("s sc=##class(Scratch.test).dummyClass5(p1,p2,p3,p4,p5)",1000000,"dummyClass5 "_$p($zv,"(Build"))
   n st s st=##class(Scratch.test).%New() d run^zmawr("s sc=st.dummyInst5(p1,p2,p3,p4,p5)",1000000,"dummyInst5 "_$p($zv,"(Build"))
   s st=""
   d run^zmawr("s sc=##class(Scratch.test).dummyClassNull()",1000000,"dummyClassNull "_$p($zv,"(Build"))
   n st s st=##class(Scratch.test).%New() d run^zmawr("s sc=st.dummyInstNull()",1000000,"dummyInstNull "_$p($zv,"(Build"))
   q

run(what, n, comment) ; execute line 'what' 'n' times
   n i s n=$g(n,1)
   s comment=$g(comment,"********** "_what_" "_n_" run(s) **********")
   w comment,!
   s zzh0=$zh
   f i=1:1:n x what
   s zzdt=$zh-zzh0 w "total time = "_zzdt_" avg time = "_(zzdt/n),!
   q

The results were:

USER>d ClassVsInst^zmawr
dummyClass10 Cache for Windows (x86-64) 2017.2.2
total time = .377751 avg time = .000000377751
dummyInst10 Cache for Windows (x86-64) 2017.2.2
total time = .338336 avg time = .000000338336
dummyClass5 Cache for Windows (x86-64) 2017.2.2
total time = .335734 avg time = .000000335734
dummyInst5 Cache for Windows (x86-64) 2017.2.2
total time = .280145 avg time = .000000280145
dummyClassNull Cache for Windows (x86-64) 2017.2.2
total time = .256858 avg time = .000000256858
dummyInstNull Cache for Windows (x86-64) 2017.2.2
total time = .225813 avg time = .000000225813

So, despite my expectations, oref.Method() call turned to be quicker than its ##class(myClass).myMethod() analogue. As there is only less than microsecond effect per call, I don't see any reason for refactoring.

go to post

Alexey Maslov · Aug 25, 2018

Task Manager Email facility supports plain authentication only, despite %Net.SMTP supports it for ages. That's true up to 2017.2, not sure about more recent version (2018.1). Therefore if you need STARTTLS or other SSL/TLS stuff you need to write your own eMail notificator for your task(s).