I worked with a customer last year that had about 500,000 messages per day. It took an hour to collect one day of data and 8.5 hours to process it on a dedicated m6i.large. They found 12,000 discrepancies but used the SkipList feature and some SQL queries to sift through the results. 

E.g. 

SELECT Count(*), M.ServiceOperation, M.DocumentType 
FROM HS_InteropTools_HL7_Compare.Temp T, HS_InteropTools_HL7_Compare.Message M 
WHERE identical = 0 AND T.MessageRowID = M.ID AND M.RunIdentifier='Source'
GROUP BY M.ServiceOperation,M.DocumentType

A few thoughts: 

  1. I consider <FILEFULL> or <DSKFUL> errors to be risks for data integrity -- the system is unable to write everything attempted. This is less of a risk with IRISTEMP, especially if the entire instance fails. However, if other databases are affected, you may have some physical or logical data integrity issues to resolve. 
  2. Check your SQL query plans. I suspect you're generating some large temp tables if your IRISTEMP is the database that is filling up your disk. 
  3. IRISTEMP is special -- it tries to keep the blocks in memory as much as possible before writing them out to the disk. If you're exhausting your disk space because IRISTEMP grew too large, then Alexander is correct: some additional global buffers could help. However, if your load exceeds your capacity, the bag will eventually burst. 
  4. Setting a Maximum Size for IRISTEMP will not completely resolve the issue. However, it may make the situation more readily recoverable. If your IRISTEMP is on the same filesystem as your OS, it can be more difficult to recover without a restart. 
  5. Consider setting MaxIRISTempSizeAtStart to be be able to recover more readily by automatically reducing the size of IRISTEMP

I'd love to know more about your specific use case for using 32KB blocks for the database. In my experience, 8KB blocks are generally more adaptable unless you're exclusively working with atomic data elements that are that size or larger. IMHO, it's an instance wide consideration rather than only a database level consideration because allocating buffers of 32KB reserves a portion of the memory for blocks of that size. 

I expect transactions rolled back would be missing a TCOMMIT in the journal file. Walking the journal files and counting the TSTARTS vs TCOMMITS should give you a rough number. It could be off by the number of transactions that happened to be open at the start of the period. That error margin will vary significantly based on activity. 

You should definitely avoid putting strange journal files in your local journal directory! 

As Dmitry suggested, the Journal APIs will let you read the contents of a journal file. You can also use the existing Journal Profile utility for a general sense of which globals are most active in the file.

set path = ##class(%SYS.Journal.System).GetLastFileName() //example, use any journal file
do ##class(%CSP.UI.System.OpenJournalPane).ComputeJournalProfile(path)
zw:$ZV["Cach" ^CacheTemp.JournalProfile(path)
zw:$ZV["IRIS" ^IRIS.Temp.JournalProfile(path)