Question
Raghuram Devarakonda · Apr 4, 2017

Does ExternalFreeze command update the data files before suspending write daemon?

 

Hi,

The command "##Class(Backup.General).ExternalFreeze()" switches journal files as can be seen in the following log message:

"Journaling switched to: /cache/mgr/journal/20170331.003

Backup.General.ExternalFreeze: Start a journal restore for this backup with journal file: /cache/mgr/journal/
20170331.003"

It is not clear to me whether the data files are updated with all the transactions from journals prior to this new journal file, before write daemon is suspended. Put another way, if I backup the data files after Freeze command and then bring all of them back (say on another machine), is it mandatory to do a journal recovery if I am not interested in any data after the Freeze?

I hope that the wording is not too complicated. Any help is appreciated.

Thanks,

Raghu

1
0 436
Discussion (8)5
Log in or sign up to continue

Upon return from ExternalFreeze(), the CACHE.DAT files will contain all of the updates that occurred prior to when it was invoked.  Some of those updates may, in fact, be journaled in the file that was switched to (the .003 file in your example), though that doesn't really matter for your question.

BUT, you still need to do journal restore, in general, because the backup image may contain partially committed transactions and journal restore is what rolls them back, even if the image of journals that you have at restore time contains no newer records than the CACHE.DAT files do.  This is covered in the Restore section of documentation, which I recommend having a look at:  http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=...

There is an exception to this, and that is if you are a crash-consistent snapshot of the entire system, including all CACHE.DAT files, the manager directory, journals, and the WIJ.  In that case, all the crash-consistency guarantees that the WIJ and journals confer mean that when you start that restored image, the usual startup recovery actions will take care of any required roll forward and roll back from journals automatically.   In that scenario with crash-consistent snapshots, ExternalFreeze() wasn't even needed to begin with, because crash-consistent snapshot is by definition good enough.  However, ExternalFreeze() is typically used for planned external backups because it allows you to restore a subset of databases rather than requiring restore of the entire system.

Ray,

Thanks for detailed answer. I do wish to use Freeze command for the backup so I will perform the journal restore. Based on your explanation then, is it sufficient to do this restore using just the journal file that is switched to at the time of Freeze? 

Thanks,

Raghu

Raghu,

Generally, you will need only the one journal file, but if any transactions are open from a previous journal file (for example, this may if the current journal file is very short at the time of the freeze), you may need to restore those journal files as well.

Thomas

You will start the restore at that file that was switched to (your .003 file), and that file contains metadata that allows us to find the oldest open transaction to rollback.  The rollback as part of journal restore will scan backwards in the journal stream to find it if needed.  If you need to know what that oldest file will be, you can get it via the RequiredFile output parameter of ExternalFreeze() or by calling %SYS.Journal.File:RequiredForRecovery() before calling ExternalFreeze().  Again though, you don't need to start the journal restore from here, just have it (and the journal.log to find it) available at restore time.  So, if you're backing up and restoring all journals that are on the system, this basically takes care of itself.

crash-consistent snapshot of the entire system, including all CACHE.DAT files, the manager directory, journals, and the WIJ.

Ray, may I ask you to clarify this a bit?

Is any snapshot of the entire system can be considered crash-consistent?

 

If you have a true moment-in-time snapshot image of all the pieces of Caché (databases, WIJ, Journals, installation/manager directory, etc), then restoring that image is, to Caché, just as though the machine had crashed at that moment in time.  When the instance of Caché within that restored image starts up, all Caché's usual automatic recovery mechanisms that give you full protection against system crashes equivalently give you full protection in this restore scenario.

Whether a given snapshot can be considered crash-consistent comes from the underlying snapshotting technology, but in general that's what "snapshot" means.  The main consideration is that all of the filesystems involved in Caché are part of the same moment-in-time (sometimes referred to as a "consistency group").  It's no good if you take an image of the CACHE.DAT files from one moment in time with an image of the WIJ or Journals from another.

Most production sites wouldn't plan their backups this way because it means that the only operation you can do on the backup image is restore the whole thing and start Caché.  You can't take one CACHE.DAT from there and get it to a consistent state.  But, in the case of snapshots of a VM guest, this does come up a fair bit, since it's simple to take an image of a guest and start it on other hardware.  

Let me know if you have questions.

Thank you, Ray.

Most production sites wouldn't plan their backups this way because it means that the only operation you can do on the backup image is restore the whole thing and start Caché.

Another reason of doing so can be the number of articles, docs, learning materials which taught us _always_ perform ExternalFreeze accompanied with ExternalThaw on every external snapshot making.