Question
· Nov 17, 2023

How to find out details on what caused an automatic transaction rollback.

I'm working for an organisation that is running a very old version of InterSystems Cache (5.016) which runs on AIX . The last two times we have re-booted Cache, we have encountered rollbacks. I've been asked two questions. During the rollback it was "How long is it going to take?" and after the system returned, it was "So what caused it?". My answer to both was "I don't know".

I have looked at ^JRNDUMP to see if the before and after journal files would give me an idea to answer the second question, but the last entry of the "before" journal was at 15:15 (the time Cache shutdown) and the first entry of the "after" file was at 15:43, which was when rollback had completed The entries in the 'cconsole.log' are as follows..

11/16-15:14:56:171 (34996582) Write daemon started.
11/16-15:14:59:287 (39977406) Performing Journal Recovery
11/16-15:14:59:293 (39977406) Performing Transaction Rollback
11/16-15:15:00:339 (39977406) Max Journal Size: 1073741824
11/16-15:15:00:339 (39977406) START: /cache/db2/journal/20231116.027
11/16-15:43:34:854 (39977406) Journaling selected globals to /cache/db2/journal/20231116.027 started

I'm assuming that later versions of Cache have some kind of 'rollback' logging, but I can't find anything of use in this particular version.

If anyone has any suggestions, they'd be gratefully received :-)
Many thanks, Mike.

$ZV: 5.016.5902.0
Discussion (4)2
Log in or sign up to continue

Mike

I did not test 5.0.16 but I don't think this code has changed much so I expect this will work for you.

in 1 terminal I started a transaction and set a global

USER>tstart

TL1:USER>s ^bjb("TSTART")=1
 
TL1:USER>

I got the Pid from the terminal window and then in a second terminal I looked at the Process Query:

USER>s p=##class(%SYS.ProcessQuery).%OpenId(11772)
 
USER>w p.InTransaction
197264

If the process is in a transaction the value is > 0.  This value is the offset in the journal file where the transaction started.  If the Journal File switches the offset does not change so it is not clear what file to look in if you are looking for long transactions.

To find long transactions you would need to write some code that would run every x minutes, loop over all the processes, and check to see if any are in a transaction.  If they are save the current journal file, the offset, and the time.  You would then be able to find transactions that are open for a long time and look at the journal file to figure out what they are trying to do and why the transaction is not committed.