Question
· Jan 12, 2020

Timeout for $zf

In one of the projects, when we have ECP with 10 ECP application servers, from time to time we faced the issue when our journals fail to purge, due to open transactions. While we have about 100-150 GB journal files per day, it quite quickly became a big issue, and with mirroring a very big issue. Mostly we just rebooted our ECP Data server, so it searches rollbacks any transactions, but such process is too long, may steal a few hours. I did not find any way, how to get the list of the open transactions from one place from ECP Data Server. We just migrated our Data server to 2018.1. while our App servers still on 2012.2 due to some reasons and we can't migrate them. 

I found useful query %SYS.Journal.Transaction:List, but 2012.2 not have it, and it useless in ECP, works only for local processes (or due to outdated App servers). So, I had to connect to each server, and with %SYS.ProcessQuery, find any process with open transactions, and I found it. CSP Session process, which called some of our external tool with $ZF(-1), hangs on this line for a few days already due to errors in the external tool.

Looks like $zf(-1), or even newest $zf(-100) does not offer any timeout options. What would you recommend to do in this case, how to prevent $zf(-1) to hang for a few days, and limit it by minutes?

If anybody can say how possible to easily monitor when transaction was opened for a long time, and where it was started in case of ECP configuration?

Discussion (4)3
Log in or sign up to continue

Hi Dmitriy,

$ZF(-1) has no chance for a timeout it is strictly synchronous

So $ZF(-2) and Looping for a result might be a workaround
$ZF(-100,"/ASYNC", ...) may do the same. See details

Both need to run the external routine in a script that documents its completion in some file and you check it.

A different approach could be a Command Pipe (CPIPE) where you read the result with a timed READ.
It's basically the same

A warning about long time open transactions will be placed in cconsole.log and alerts.log (both files in $system.Util.InstallDirectory()_"mgr" directory) - at least  this is the case what I see on a customers system. The entries look like this:

04/05/19-02:31:34:470 (21012) 1 [SYSTEM MONITOR] TransOpenSecs Warning: One or more transactions open longer than 10 minutes. Process id  18624 22208 (only top 5 shown)
08/29/19-14:31:03:802 (18576) 1 [SYSTEM MONITOR] TransOpenSecs Warning: One or more transactions open longer than 10 minutes. Process id  4128 (only top 5 shown)
09/04/19-17:16:19:090 (21344) 1 [SYSTEM MONITOR] TransOpenSecs Warning: One or more transactions open longer than 10 minutes. Process id  25872 (only top 5 shown)

So it should be as easy as monitoring those two logfiles. But how it looks like on an ECP server, don't ask me.

Hello @Dmitry Maslennikov ,

I've modified a 'legacy method' in order to implement a timeout.

May be the following code could help you : 

/// Run a command line<BR/>
/// Echos of command are stored in result argument <br/>
/// Return 0 if a timeout occurs.
ClassMethod runCmdWithTimeout(
command As %String,
ByRef result As %Binary,
timeout As %Integer) As %Boolean
{
end = $zh+timeout, timeout = 0
    a=$zu(69,40) $et $et="s a=$zu(68,40,"_a_")" 
    command:"qr" a=$zu(68,40,1)
    command i=1:1 line:1 s:line'="" result(i)=line s:$zh>end timeout=1 q:$zeof||timeout
    command
    $et
    'timeout
}