Please share your experiences of structural database integrity checking / repair

During a Caché system management training course today we discussed structural database integrity (a.k.a. physical integrity) and the tools InterSystems provides for checking integrity and fixing problems.

It started me wondering how many sites regularly check the structural integrity of their databases, and whether loss of this integrity (a.k.a. database degradation) still gets encountered from time to time.

If you have any views or experiences to share, please add them here as comments.

  • 0
  • 0
  • 315
  • 4

Comments

Recent InterSystems Cache works rather well and has a tendency to make us lazy about checking the DB. SQL, Mongo, etc. DB admins must check/fix/maintain DBs on a regular basis.

For our smallish 100GB-200GB Cache DBs we scheduled daily background integrity checking but would typically find small damage only once every 3 years or more due primarily to hardware failures.

Cache appears to be incredibly resilient compared to other DBs.

Tom Fitzgibbon | gototomAtG...l.com | 917-933-1226

It might sound a bit of a far fetched idea...

After recent (minor) database corruptions caused by VM host activities I did wonder if a future Caché version could be made to self heal itself by using its mirror member.

For integrity checking only ^Integrity tool from InterSystems. And it should check database periodically, but it is not enough just to start periodically checking database. Backing up database also should be used, as well, as journalling for all time since the latest full backup was planned. Every day checking or every week depends on how big is your database, how much time it takes.

Repair, it is another story, I don't know any automatic tools which can do such task. It is only manual work, with some knowledge about the structure of Caché Blocks. I did it multiple times. For one customer I repaired database from corrupted backup. They used the same disk for database and backups, and after issues with this disk, one of the backups appeared to have more valuable data available than the database, but they lost some working days anyway. For repair database, the very useful tool is ^REPAIR from InterSystems. But sometimes even this tool could say that there is no database in this file. Once I succeeded with repairing database when some blocks were changed in their position, and the first block was deleted, and ^REPAIR said it is not a database.

Our customers are to check DB integrity on regular basis, usually weekly, while I don't remember a case when it showed errors which were not evidient without it (<DATABASE> errors in error and console logs, etc).

Last time when I had an opportunity to use ^REPAIR was about 1.5 year ago, when our support specialist defragmented free space in a database under Caché 2015.1.2. The bulletin from InterSystems about the possibility of defragmentation errors arrived a bit later... Thanks to backup performed before the defragmentation, the opportunity to use ^REPAIR was closed that time :) After upgrade to 2015.1.4 no errors of such kind were detected in the field.

The faults of Integrity check are:
- when there is some concurrent users' activity it may provide false positives in per database summary report (Errors found in database...) while there are no real errors neither in database nor in per global report;
- (mostly about TASKMGR): there is no way to include into the task completion reports (which can be e-mailed) any information from the task, e.g., about errors found by Integrity.