User bio

Worked at InterSystems since 2001 in development, product management, and support.

Expertise:  core database engine; data integrity and performance thereof; high availability & disaster recovery; ObjectScript programming and runtime internals.

Member since Dec 7, 2015
Replies:

I said that only because managing database size with SLM can be painful operationally: having to predict where the growth is going to be and coordinate a configuration change in advance of the new mapping range getting used by the application.  I did not mean to imply that anything bad happens when you do this.  In fact, if the growth of a global isn't bounded by some natural data lifespan, or some application-level archival process, then SLM is unavoidable with a sufficient rate of growth.  By planning in advance for the growth, though, and starting the largest expected globals mapped to their own databases, you might stave that off for a long time. 

Note: there's a little runtime cost to resolving SLM that doesn't exist for (whole) global mapping, but it's generally a noise-level cost unless you've generated a very complex set of mappings (more complex than you'd likely do as a manual configuration step)

From my perspective, the main reason to run integrity check is so that if you ever did have database degradation, you know that you have a backup that you can recover from.  I've seen too many disasters of the form that corruption is discovered that predates any available backup.  For use cases that would never recover from backup or mirrored copies or the like for disaster recovery, you might reasonably argue that integrity check isn't worth the effort/cost.   

(As a detail, just accessing a corrupted global won't  hang the system, but the system will hang if corruption causes a SET or KILL to fail in the middle of a multi-block update.)

Anyway, to your good thoughts about possible enhancements:

  • It turns out that one of my recent enhancements, as yet unreleased, did open up a possibiility of a "pointer block only" check (as a side effect of a different goal).  However, I don't think it's very valuable because pointer blocks make up a very small fraction of all blocks.  For typical patterns of subscripts, there's in the neighborhood 300-500 data blocks pointed to by a pointer block in 8KB databases, so you're talking about ~0.2% of all the blocks.  I don't think you'd draw any meaningful conclusion from a clean check that didn't include data blocks.  Don't be confused that most integrity check error starts with "Error while processing pointer block %d".  That's just the way integrity check works.  The vast majority of those are from a bottom pointer block and were found only because it read every data block under it to find the inconsistency.
  • We do actually have some protection against errors in the most recently written blocks (following a crash) via the Write Image Journal block comparison.  It's a totally different mechanism, but it is designed with the thought that when systems lose power, there's some history of drives dropping or corrupting the most recent writes, despite promises that they had already succeeded (via our very careful use of fsync() and similar mechanisms).
  • About piggy-backing on increment change tracking, it's an interesting idea, but again I worry that many of the failure modes that lead to corruption wouldn't necessarily get uncovered, and so it doesn't give the guarantee you need from integrity check in order to know that a backup image could be relied upon in a disaster.  

I think integrity check isn't the primary driver of that architectural decision, but it might be part of the consideration.  Any single database is constrained to a max size of 2^32 blocks, so 32TB for standard 8KB block size.  There's practical reasons not to go anywhere near that high: backup/restore and other operational tasks on a single database may be more onerous,  AIX/JFS2 has a 16TB file limit anyway, integrity check has less ability to be parallelized if the huge database is also primarily a single global, (and if you're running older versions there's a couple bugs involving databases that have more than 2^31 blocks, all fixed in latest maintenance kits).

Given these and other considerations, I believe most sites shoot for max database sizes somewhere between 2 and 10 TB.   So for 100TB we're talking about a few dozen databases.   You'd hope that much data, especially if it's largely in active use, is spread over a significant number of different globals (e.g. many tables and their indices).  Ideally you use global mappings in anticipation of such huge growth to organize the globals into databases and as much as possible avoid the need to use subscript level mapping (SLM) to manage growth of a single global across multiple databases.  If growth is unbounded though (ie this isn't data that sort of data that can eventually be moved to some separate archive structure) then subscript level mapping to map across these dozen or more databases becomes inevitable. 

As for running integrity check on that much data, it will take some substantial time and you need to find the balance of how frequently you want to run it, how much storage bandwidth is reasonable for it to consume, and whether you can run it on an offline copy.  Since the other factors I mentioned already put you into have a multitude of separate databases (with any giant globals spread over some number of them via SLM), integrity check will be able to be well parallelized.

Certifications & Credly badges:
Ray has no Certifications & Credly badges yet.
Global Masters badges:
Ray has no Global Masters badges yet.
Followers:
Following:
Ray has not followed anybody yet.