New test I've managed to write:
a) is very close to real world example where the "puzzle" originates from,
b) is free from file i/o as its speed can differ on docker version from the Windows one,
c) do some (temporary) globals read/write, while all globals involved are entirely cashed.

Its results on IRIS 2025.1 CE/docker impressed me: in this version it runs 4-5 times faster than in native IRIS 2022.1/Windows and doesn't slow down after intermediate call of a big routine function. This is true even when both routines involved (small ztestLib and big ztestBigMac) were generated as classic (non procedure) functions collections. In the case of procedures results were good in both versions involved into my testing.

If anybody wants to reproduce this new test on native Windows or Linux IRIS 2025.1 or 2024.1 version, I would be happy to upload necessary code and data (zipped GOF file of 40MB, 250 after unpacking).

Hi Yaron,

As to my experience, MONLBL adds its 15-20% payload, so it usually can't help in searching the reason of rather small performance flaws, like in my case. But I was happy enough to get a real code example which showed ~100% drawback on the second run, and MONLBL showed me the reason: it was just a $$converIn^Lib function call. Having it in hands, I just substituted it with direct $zconvert call and got ~300% speed increase on both runs!
This "victory" didn't stop my attempts to find the reason of this issue because I anticipate similar cases with other parts of our code which would be more difficult to find and much more difficult to fix. That's why my synthetic samples use $$convertIn function in different environments: far or local calls, procedures or not - it was just a starting point for the investigation.

Vitaliy, Jon:

Firstly thank you for your interest to my "puzzle".

Vitaliy,
results would be more stable if you:

  • run each test set in a new iris session,
  • increase the repetition counter from 1E7 to 1E8.

In my case I'm getting ~6% drawback on each test set run.

Jon,
2025.1 definitely worth trying, at last for its potentially better performance. Meanwhile Vitaliy already have run my tests in 2025.1 and got the results very similar to mine.

You expected that the reason of performance drawback is access to public variables of routines being called. To check it, I've transformed the convertInLocal function to the procedure: 

convertInLocal(str,table) [] Public {
  set strout=$zconvert(str,"I",table,handle)
  quit strout_$select(handle="":"",1:"?")
}

regenerated Big Routine as a procedure: 

set sc=$$createProc^ztest22("ztestBigMac",32000,1900)

and added two new samples calling $$convertInLocal instead of $$convertIn^ztestLib. Several runs of these samples have shown the similar performance drawback (~5-7%) as those had been calling $$convertIn^ztestLib.

So, no public variables and no "classic" routines being called, but no miracle happened...

ROUTINE ztest22
Version()  quit 20250611
  ;
  ; do init^ztest22(100)  ; in one irissession
  ;
  ; in another irissession:
  ; set N=1E7 do runOne^ztest22("convertInRunFar",N,.fields) set dt1=$get(fields("dt")) do runOne^ztest22("convertInBigMacFar",N,.fields) set dt2=$get(fields("dt")) w $fn(dt2-dt1*100/dt1,"",2)_"% difference",!
  ;
runList(bKill,pList,pN)
  new iSample
  if $get(bKill,0) kill ^measure
  set pN=$get(pN,1E7)
  for iSample=1:1:$length(pList,",") do runOne($piece(pList,",",iSample),pN,.fields) merge ^measure($i(^measure))=fields
  quit
runOne(pOffset,pN,pFields) ; k  do runOne^ztest22("convertIn",1E6,.fields)
  new (pOffset,pN,pFields) ; to isolate sample's local symtab from other samples ones ;new N,i,textRef,rc,dt,pre,post,sample,category,comment
  set pN=$get(pN,1E5)
  if pOffset=+pOffset set textRef="meta+"_pOffset
  else  set textRef=pOffset
  set rc=$$getFields(textRef) if 'rc w "$text("_textRef_") does not exist",! goto runOneQ
  if pre'="" xecute pre
  if $get(N)="" set N="pN"
  if '$isValidNum(N) set N=@N
  if N=1 {
   set dt=$zhorolog
   xecute sample
   set dt=($zhorolog-dt)*1E6
  } elseif sample="set convertInx=$$convertIn^ztestLib(^mtempFF(i),""UTF8"")" {
   set dt0=$zhorolog
   for i=1:1:N set convertInx=""
   set dt0=($zhorolog-dt0)
   set dt=$zhorolog
   for i=1:1:N set convertInx=$$convertIn^ztestLib(^mtempFF(i),"UTF8")
   set dt=($zhorolog-dt-dt0)*1E6,dt=dt/N
  } elseif sample="set convertInx=$$convertIn^ztestLib(convertIny,""UTF8"")" {
   set dt0=$zhorolog
   for i=1:1:N set convertInx=""
   set dt0=($zhorolog-dt0)
   set dt=$zhorolog
   for i=1:1:N set convertInx=$$convertIn^ztestLib(convertIny,"UTF8")
   set dt=($zhorolog-dt-dt0)*1E6,dt=dt/N
  } else {
   set temp="set convertInx="""""
   set dt0=$zhorolog
   xecute "for i=1:1:N "_temp
   set dt0=($zhorolog-dt0)
   set dt=$zhorolog
   xecute "for i=1:1:N "_sample
   set dt=($zhorolog-dt-dt0)*1E6,dt=dt/N
  }
  write pOffset,?30,$fn(dt,"",3),?45,comment,!
  if post'="" xecute post
  do setFields(textRef,.pFields)
runOneQ
  quit
getFields(pLabel) ; w $$getFields^ztest22("convertIn"),! zwrite ; k  w $$getFields^ztest22("meta+"_1),! zwrite
  new meta,sep,fields,i,fieldName,field
  if $text(@pLabel)="" quit 0
  set meta=$text(meta)
  set sep=";~"
  set fields=$text(@pLabel)
  for i=2:1:$length(meta,sep) set fieldName=$piece(meta,sep,i),field=$piece(fields,sep,i) set @fieldName=field
  quit 1
setFields(pLabel,pFields) ; w $$setFields^ztest22("convertIn",.fields),!
  kill pFields
  set pFields("label")=pLabel
  set pFields("comment")=comment
  set pFields("dt")=dt
  quit 1
init(pInsertAfterN) ; do init^ztest22(0)
  set sc=$$createRoutine("ztestBigMac",32000,1900)
  if 'sc do $system.Status.DisplayError(sc)
  set sc=$$createProc("ztestLib",1500,200,1,.pInsertAfterN)
  if 'sc do $system.Status.DisplayError(sc)
  quit

/// pName - rtn name, pNLines - #lines, pNLabels - #labels, pInsertFunction - insert $$convertIn?, pInsertAfterNLabel - insert it after Nth label
createRoutine(pName,pNLines,pNLabels,pInsertFunction,pInsertAfterNLabel)
  new nPerLabel,nInsertAfterLabel
  set pInsertFunction=$get(pInsertFunction,0)
  set pInsertAfterN=$get(pInsertAfterN,pNLabels)
  do ##class(%Routine).Delete(pName_".MAC")
  set routine = ##class(%Routine).%New(pName_".MAC")
  ; Write lines of code to the routine
  do routine.WriteLine(pName)
  do routine.WriteLine("%New()")
  do routine.WriteLine(" quit """_pName_"""")
  set nPerLabel=pNLines\pNLabels-3 set:nPerLabel<=0 nPerLabel=1
  #; set nInsertAfterLabel=pInsertAfterN\nPerLabel
  if pInsertFunction&&(pInsertAfterNLabel=0) {
    do routine.WriteLine("convertIn(str,table)")
    do routine.WriteLine(" new handle,strout")
    do routine.WriteLine(" set strout=$zconvert(str,""I"",table,handle)")
    do routine.WriteLine(" quit strout_$select(handle="""":"""",1:""?"")")
    set pInsertFunction=0
  }
  for i=1:1:pNLabels {
    do routine.WriteLine("a"_i_"()")
    do routine.WriteLine(" new a"_i_",b"_i_",c"_i_",d"_i_",e"_i)
    for j=1:1:nPerLabel {
      do routine.WriteLine(" set a"_i_"="_j_" set b"_i_"="_j_" set c"_i_"="_j_" set d"_i_"="_j_" set e"_i_"="_j)
    }
    do routine.WriteLine(" quit a"_i)
    if pInsertFunction&&(i>=pInsertAfterNLabel) {
      do routine.WriteLine("convertIn(str,table)")
      do routine.WriteLine(" new handle,strout")
      do routine.WriteLine(" set strout=$zconvert(str,""I"",table,handle)")
      do routine.WriteLine(" quit strout_$select(handle="""":"""",1:""?"")")
      set pInsertFunction=0
    }
  }
  if pInsertFunction {
      do routine.WriteLine("convertIn(str,table)")
      do routine.WriteLine(" new handle,strout")
      do routine.WriteLine(" set strout=$zconvert(str,""I"",table,handle)")
      do routine.WriteLine(" quit strout_$select(handle="""":"""",1:""?"")")
      set pInsertFunction=0
  }
  ; save the routine
  do routine.Save()
  ; compile the routine
  set sc=routine.Compile("k")
  quit sc

/// pName - rtn name, pNLines - #lines, pNLabels - #labels, pInsertFunction - insert $$convertIn?, pInsertAfterNLabel - insert it after Nth label
createProc(pName,pNLines,pNLabels,pInsertFunction,pInsertAfterNLabel)
  new nPerLabel,nInsertAfterLabel
  set pInsertFunction=$get(pInsertFunction,0)
  set pInsertAfterN=$get(pInsertAfterN,pNLabels)
  do ##class(%Routine).Delete(pName_".MAC")
  set routine = ##class(%Routine).%New(pName_".MAC")
  ; Write lines of code to the routine
  do routine.WriteLine(pName)
  do routine.WriteLine("%New() [] Public")
  do routine.WriteLine("{ quit """_pName_""" }")
  set nPerLabel=pNLines\pNLabels-3 set:nPerLabel<=0 nPerLabel=1
  #; set nInsertAfterLabel=pInsertAfterN\nPerLabel
  if pInsertFunction&&(pInsertAfterNLabel=0) {
    do routine.WriteLine("convertIn(str,table) [] Public")
    do routine.WriteLine("{ set strout=$zconvert(str,""I"",table,handle)")
    do routine.WriteLine(" quit strout_$select(handle="""":"""",1:""?"") }")
    set pInsertFunction=0
  }
  for i=1:1:pNLabels {
    do routine.WriteLine("a"_i_"() [] Public")
    do routine.WriteLine(" {")
    for j=1:1:nPerLabel {
      do routine.WriteLine(" set a"_i_"="_j_" set b"_i_"="_j_" set c"_i_"="_j_" set d"_i_"="_j_" set e"_i_"="_j)
    }
    do routine.WriteLine(" quit a"_i_" }")
    if pInsertFunction&&(i>=pInsertAfterNLabel) {
      do routine.WriteLine("convertIn(str,table) [] Public")
      do routine.WriteLine("{ set strout=$zconvert(str,""I"",table,handle)")
      do routine.WriteLine(" quit strout_$select(handle="""":"""",1:""?"") }")
      set pInsertFunction=0
    }
  }
  if pInsertFunction {
    do routine.WriteLine("convertIn(str,table) [] Public")
    do routine.WriteLine("{ set strout=$zconvert(str,""I"",table,handle)")
    do routine.WriteLine(" quit strout_$select(handle="""":"""",1:""?"") }")
    set pInsertFunction=0
  }
  ; save the routine
  do routine.Save()
  ; compile the routine
  set sc=routine.Compile("k")
  quit sc
convertInLocal(str,table)
  #; quit $$convertIn^ztestLib(str,table)
  new handle,strout
  set strout=$zconvert(str,"I",table,handle)
  quit strout_$select(handle="":"",1:"?")

meta ;~category;~pre;~N;~sample;~post;~comment;~dt
convertInFF ;~convert;~;~pN;~set convertInx=$$convertIn^ztestLib(^mtempFF(i),"UTF8");~;~looping $$convertIn^ztestLib(^mtempFF(i)...)
convertInFFBigMac ;~convert;~set convertName=$$%New^ztestBigMac();~pN;~set convertInx=$$convertIn^ztestLib(^mtempFF(i),"UTF8");~;~%New^ztestBigMac + looping $$convertIn^ztestLib(^mtempFF(i)...)
convertInRunFar ;~convert;~set convertIny=$zcvt("Маленькая умная Коричневая Лиса прыгает через лежащую сонную Пятнистую Собаку","o","UTF8");~pN;~set convertInx=$$convertIn^ztestLib(convertIny,"UTF8");~;~looping $$convertIn^ztestLib (far)
convertInBigMacFar ;~convert;~set convertName=$$%New^ztestBigMac() set convertIny=$zcvt("Маленькая умная Коричневая Лиса прыгает через лежащую сонную Пятнистую Собаку","o","UTF8");~pN;~set convertInx=$$convertIn^ztestLib(convertIny,"UTF8");~;~%New^ztestBigMac, looping $$convertIn^ztestLib (far)

Hi Timo,

Thank you for reply. My routine cache is big enough for 1000+ routines, besides the issue was reproduced on several computers, under Linux as well. Alas, when I try to reproduce it using synthetic samples it does not show significant performance drawback, no more than 5%. More realistic sample I have shows better results, but it is too big to share it with DC.
So I'm closing the loop.

Best regards!

When several instances are connected to remote DB, then all their locks are managed in the IRIS instance where this DB

That's only partially true: they are kept in both LOCKTABs, on data server and on app server as well.
This behavior can be easily checked in Management Portal. So if one of app servers makes great trouble with LOCKs, it can affect other servers.

In other words, it's a kind of performance optimization when using $increment

I was surprised when discovered that $increment with locals is not so well optimized as it can be and as it is for globals. A small proof:

USER> set top=1E7,^||a=0,t=$zh for i=1:1:top {set ^||a=$get(^||a)+1} w $fn($zh-t*1E6/top,"",3)
0.265
USER> set top=1E7,^||a=0,t=$zh for i=1:1:top {do $i(^||a)} w $fn($zh-t*1E6/top,"",3)
0.176
USER> set top=1E7,a=0,t=$zh for i=1:1:top {set a=$get(a)+1} w $fn($zh-t*1E6/top,"",3)
0.050
USER> set top=1E7,a=0,t=$zh for i=1:1:top {do $i(a)} w $fn($zh-t*1E6/top,"",3)
0.056

I intentionally inserted $get() call just to make samples functionality closer to each other.

It's rather difficult to get for what reason $i(a) is not just the same as `set a=$get(a)+1` as it's nothing to do with TP and other globals related stuff here?

Maybe nobody was interested in deeper optimization of $i() in this context...

Eduard,
what is the reason of having nested transactions inside the Worker method?
And how can you distribute single ("root") transaction execution and control among several processes?

I'd take another approach:

  • work manager master process is just distributing work items among workers and waiting for complete
  • each work item is just preparing data in some ^IRISTEMP* global w/o writing to database and reports the status to the master process
  • master process checks completion status to make a choice:
    • to TS, store ^IRISTEMP* inside application database, TC, kill ^IRISTEMP*
    • or just to kill ^IRISTEMP*.

Pros: it can be implemented using WQM.
Cons: huge amount of temporary data can be written into IRISTEMP database, but if the results of work items can be processed separately, master process can do it without waiting for all items completeness killing temporary subglobals one by one.