I said that only because managing database size with SLM can be painful operationally: having to predict where the growth is going to be and coordinate a configuration change in advance of the new mapping range getting used by the application. I did not mean to imply that anything bad happens when you do this. In fact, if the growth of a global isn't bounded by some natural data lifespan, or some application-level archival process, then SLM is unavoidable with a sufficient rate of growth. By planning in advance for the growth, though, and starting the largest expected globals mapped to their own databases, you might stave that off for a long time.
Note: there's a little runtime cost to resolving SLM that doesn't exist for (whole) global mapping, but it's generally a noise-level cost unless you've generated a very complex set of mappings (more complex than you'd likely do as a manual configuration step)
From my perspective, the main reason to run integrity check is so that if you ever did have database degradation, you know that you have a backup that you can recover from. I've seen too many disasters of the form that corruption is discovered that predates any available backup. For use cases that would never recover from backup or mirrored copies or the like for disaster recovery, you might reasonably argue that integrity check isn't worth the effort/cost.
(As a detail, just accessing a corrupted global won't hang the system, but the system will hang if corruption causes a SET or KILL to fail in the middle of a multi-block update.)
Anyway, to your good thoughts about possible enhancements:
I think integrity check isn't the primary driver of that architectural decision, but it might be part of the consideration. Any single database is constrained to a max size of 2^32 blocks, so 32TB for standard 8KB block size. There's practical reasons not to go anywhere near that high: backup/restore and other operational tasks on a single database may be more onerous, AIX/JFS2 has a 16TB file limit anyway, integrity check has less ability to be parallelized if the huge database is also primarily a single global, (and if you're running older versions there's a couple bugs involving databases that have more than 2^31 blocks, all fixed in latest maintenance kits).
Given these and other considerations, I believe most sites shoot for max database sizes somewhere between 2 and 10 TB. So for 100TB we're talking about a few dozen databases. You'd hope that much data, especially if it's largely in active use, is spread over a significant number of different globals (e.g. many tables and their indices). Ideally you use global mappings in anticipation of such huge growth to organize the globals into databases and as much as possible avoid the need to use subscript level mapping (SLM) to manage growth of a single global across multiple databases. If growth is unbounded though (ie this isn't data that sort of data that can eventually be moved to some separate archive structure) then subscript level mapping to map across these dozen or more databases becomes inevitable.
As for running integrity check on that much data, it will take some substantial time and you need to find the balance of how frequently you want to run it, how much storage bandwidth is reasonable for it to consume, and whether you can run it on an offline copy. Since the other factors I mentioned already put you into have a multitude of separate databases (with any giant globals spread over some number of them via SLM), integrity check will be able to be well parallelized.
To leave a comment or answer to post please log in
Please log in
To leave a post please log in