When several instances are connected to remote DB, then all their locks are managed in the IRIS instance where this DB

That's only partially true: they are kept in both LOCKTABs, on data server and on app server as well.
This behavior can be easily checked in Management Portal. So if one of app servers makes great trouble with LOCKs, it can affect other servers.

In other words, it's a kind of performance optimization when using $increment

I was surprised when discovered that $increment with locals is not so well optimized as it can be and as it is for globals. A small proof:

USER> set top=1E7,^||a=0,t=$zh for i=1:1:top {set ^||a=$get(^||a)+1} w $fn($zh-t*1E6/top,"",3)
0.265
USER> set top=1E7,^||a=0,t=$zh for i=1:1:top {do $i(^||a)} w $fn($zh-t*1E6/top,"",3)
0.176
USER> set top=1E7,a=0,t=$zh for i=1:1:top {set a=$get(a)+1} w $fn($zh-t*1E6/top,"",3)
0.050
USER> set top=1E7,a=0,t=$zh for i=1:1:top {do $i(a)} w $fn($zh-t*1E6/top,"",3)
0.056

I intentionally inserted $get() call just to make samples functionality closer to each other.

It's rather difficult to get for what reason $i(a) is not just the same as `set a=$get(a)+1` as it's nothing to do with TP and other globals related stuff here?

Maybe nobody was interested in deeper optimization of $i() in this context...

Eduard,
what is the reason of having nested transactions inside the Worker method?
And how can you distribute single ("root") transaction execution and control among several processes?

I'd take another approach:

  • work manager master process is just distributing work items among workers and waiting for complete
  • each work item is just preparing data in some ^IRISTEMP* global w/o writing to database and reports the status to the master process
  • master process checks completion status to make a choice:
    • to TS, store ^IRISTEMP* inside application database, TC, kill ^IRISTEMP*
    • or just to kill ^IRISTEMP*.

Pros: it can be implemented using WQM.
Cons: huge amount of temporary data can be written into IRISTEMP database, but if the results of work items can be processed separately, master process can do it without waiting for all items completeness killing temporary subglobals one by one.

^SPOOL(docIdx) approach allows spooling sharing across ECP network, while ^SPOOL($j) does not :)

%IS is an utility which allows to choose devices by CHUI routines users and %SPOOL utility can manipulate with spool files opened using the %IS utility only.

It seems that initial purpose of spooling facility has gone with such a users and such a printers (strictly character based, w/o ability of font selection, etc), while its usage still can be actual in rather exotic cases like this one.

Yeah, but it should be used with precautions. Let's see what happens when two processes access spooling device concurrently: 

USER>f {q:'($zh\1#20)} k a o 2 f i=1:1:10 {s a(i)=i*100} u 2 zw a c 2 ; process #1
USER>s i="" f  {s i=$o(^SPOOL(1,i),1,line) q:i=""  w line} ; look inside ^SPOOL(1)...
a(1)=100
a(2)=200
a(3)=300
a(4)=400
a(5)=500
a(6)=600
a(7)=700
a(8)=800
a(9)=900
a(10)=1000
{66892,65205{11
USER>f {q:'($zh\1#20)} k a o 2 f i=1:1:10 {s a(i)=i} u 2 zw a c 2 ; process #2
USER>s i="" f  {s i=$o(^SPOOL(1,i),1,line) q:i=""  w line} ; look inside ^SPOOL(1)...
a(1)=100
a(2)=200
a(3)=300
a(4)=400
a(5)=500
a(6)=600
a(7)=700
a(8)=800
a(9)=900
a(10)=1000
{66892,65205{11

As you can notice, one process's output suppressed the other's one. To avoid it, firstly RTFM: OPEN and USE Commands for Spooling Device, and implement some synchronization pattern, e.g. 

USER>f {q:'($zh\1#20)} s docIdx=$i(^SPOOL) k a o 2:(docIdx) f i=1:1:10 {s a(i)=i*100} u 2 zw a c 2 ; process #1
USER>s i="" f  {s i=$o(^SPOOL(docIdx,i),1,line) q:i=""  w line}
a(1)=100
a(2)=200
a(3)=300
a(4)=400
a(5)=500
a(6)=600
a(7)=700
a(8)=800
a(9)=900
a(10)=1000
{66892,66645{11{
USER>f {q:'($zh\1#20)} s docIdx=$i(^SPOOL) k a o 2:(docIdx) f i=1:1:10 {s a(i)=i} u 2 zw a c 2 ; process #2
USER>s i="" f  {s i=$o(^SPOOL(docIdx,i),1,line) q:i=""  w line}
a(1)=1
a(2)=2
a(3)=3
a(4)=4
a(5)=5
a(6)=6
a(7)=7
a(8)=8
a(9)=9
a(10)=10
{66892,66645{11{

Each process has got it's own output in ^SPOOL(docIdx) now. This approach works if every consumer of spooling facility follows the same pattern; incrementing ^SPOOL is just an easiest approach, while it would be better avoid touching system globals and increment something else according to your taste.

I'd like to collect tables based on logs

In general, such approach is far from optimal for Caché like databases because logs (which are usually called  "journals" in Caché/M world) are being written on global (= lowest possible) level, disregard of data model used by an app (SQL, Persistent classes, etc). Reconstructing app level data from underlying globals can be a tricky task even for Caché guru.

That was one of the reasons why colleagues of mine took another approach for close to real-time data export from Caché to external system. In a few words, they used triggers to form application level data packets on Caché side and pipe them to the receiver. This approach saved CPU time preventing its waste for filtering out unnecessary journal records and minimized cross-system network traffic.

Robert, it sounds strange, but... Setting the Time Zone

You can use $ZTIMEZONE to set the time zone used by the current InterSystems IRIS process. Setting $ZTIMEZONE does not change the default InterSystems IRIS time zone or your computer’s time zone setting.

IMHO, $ztimezone setting is dangerous not for its system-wide effect (which it hasn't) but mostly due to its exclusions and anomalies, despite they are accurately listed in docs.