InterSystems IRIS and Caché Application Consistent Backups with Azure Backup

Primary tabs

Database systems have very specific backup requirements that in enterprise deployments require forethought and planning. For database systems, the operational goal of a backup solution is to create a copy of the data in a state that is equivalent to when application is shut down gracefully.  Application consistent backups meet these requirements and Caché provides a set of APIs that facilitate the integration with external solutions to achieve this level of backup consistency.

These APIs are ExternalFreeze and ExternalThaw. ExternalFreeze temporarily pauses writes to disk and during this period Caché commits the changes in memory. During this period the backup operation must complete and be followed by a call to ExternalThaw. This call engages the write daemons to write the cached updated in the global buffer pool (database cache) to disk and resumes normal Caché database write daemon operations. This process is transparent to user processes with Caché.  The specific API class methods are:

##Class(Backup.General).ExternalFreeze()

##Class(Backup.General).ExternalThaw()

These APIs in conjunction with the new capability of Azure Backup to execute a script prior and after the execution of a snapshot operation, provide a comprehensive backup solution for deployments of Caché on Azure. The pre/post scripting capability of Azure Backup is currently available only on Linux VMs.

Prerequisites

At the high level, there are three steps that you need to perform before you can backup a VM using Azure Backup:

  1. Create a Recovery Services vault
  2. Install has the latest version of the VM Agent.
  3. Check network access to the Azure services from your VM. 

The Recovery Services vault manages the backup goals, policies and the items to protect. You can create a Recovery Services vault via the Azure Portal or via scripting using PowerShell.  Azure Backup requires an extension that runs in your VM, is controlled by the Linux VM agent and the latest version of the agent is also required.  The extension interacts with the external facing HTTPS endpoints of Azure Storage and the Recovery Services vault.  Secure access to those services from the VM can be configured using a proxy and network rules in an Azure Network Security Group. 

For more information about these steps visit Prepare your environment to back up Resource Manager-deployed virtual machines.

Pre and Post Scripting Configuration

The ability to call a script prior to the backup operation and after is, included in the latest version of the Azure Backup Extension (Microsoft.Azure.RecoveryServices.VMSnapshotLinux). For information about how to install the extension please check the detailed feature documentation.

By default, the extension included sample pre and pot scripts located in your Linux VM at: 

/var/lib/waagent/Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9110.0/main/tempPlugin

And needs to be copied to the following locations respectively.

/etc/azure/prescript.sh
/etc/azure/postScript.sh

You can also download the script template from GitHub.

For Caché, the prescript.sh script where a call to the ExternalFreeze API can be implemented and the postScript.sh should contain the implementation that executes ExternalThaw.

The following is a sample prescript.sh implementation for Caché.

#!/bin/bash
# variables used for returning the status of the script
success=0
error=1
warning=2
status=$success
log_path="/etc/preScript.log"   #path of log file
printf  "Logs:\n" > $log_path
# TODO: Replace <CACHE INSTANCE> with the name of the running instance
csession <CACHE INSTANCE> -U%SYS "##Class(Backup.General).ExternalFreeze()" >> $log_path
status=$?
if [ $status -eq 5 ]; then
echo "SYSTEM IS FROZEN"
printf  "SYSTEM IS FROZEN\n" >> $log_path
elif [ $status -eq 3 ]; then
echo "SYSTEM FREEZE FAILED"
printf  "SYSTEM FREEZE FAILED\n" >> $log_path
status=$error
csession <CACHE INSTANCE> -U%SYS "##Class(Backup.General).ExternalThaw()"
fi

exit $status

The following is a sample postScript.sh implementation for Caché.

#!/bin/bash
# variables used for returning the status of the script
success=0
error=1
warning=2
status=$success
log_path="/etc/postScript.log"   #path of log file
printf  "Logs:\n" > $log_path
# TODO: Replace <CACHE INSTANCE> with the name of the running instance
csession <CACHE INSTANCE> -U%SYS "##class(Backup.General).ExternalThaw()"
status=$?
if [ $status req 5]; then
echo "SYSTEM IS UNFROZEN"
printf  "SYSTEM IS UNFROZEN\n" >> $log_path
elif [ $status -eq 3 ]; then
echo "SYSTEM UNFREEZE FAILED"
printf  "SYSTEM UNFREEZE FAILED\n" >> $log_path
status=$error
fi
exit $status

Executing a Backup

In the Azure Portal, you can trigger the first backup by navigating to the Recovery Service. Please consider that the VM snapshot time should be few seconds irrespective of first backup or subsequent backup. Data transfer of first backup will take longer but data transfer will start after executing post-script to thaw database and should not have any impact on the time between pre & post script.

It is highly recommended to regularly restore your backup in a non-production setting and perform database integrity checks to ensure your data protection operations are effective.

For more information about how to trigger the backup and other topics such as backup scheduling, please check Back up Azure virtual machines to a Recovery Services vault.  

Comments

I see this was written in March 2017. By chance has this ability to Freeze / Thaw Cache on Windows VM's in Azure been implemented yet?

Can a brief description of why this cannot be performed on Windows VM's in Azure be given?

Thanks for the excellent research and information, always appreciated.

Hi Dean - thanks for the comment.  There are no changes required from a Caché standpoint, however Microsoft would need to add the similar functionality to Windows to allow for Azure Backup to call a script within the target Windows VM similar to how it is done with Linux.  The scripting from Caché would be exactly the same on Windows except for using .BAT syntax rather then Linux shell scripting once Microsoft provides that capability.  Microsoft may already have it this capability?  I'll have to look to see if they have extended it to Windows as well.

Regards,
Mark B-

Microsoft only added this functionality to Linux VMs to get around the lack of a VSS-equivalent technology in Linux.

They expect Windows applications to be compatible with VSS.

We have previously opened a request for InterSystems to add VSS support to Caché but I don't believe progress has been made on it.

Am I right in understanding that IF we are happy with crash-consistent backups, as long as a backup solution is a point-in-time snapshot of the whole disk system (including journals and database files) then said backup solution should be safe to use with Caché?

Obviously application consistent is better than crash consistent, but with WIJ in there we should be safe.

We are receiving more and more requests for VSS integration, so there may be some movement on it, however no guarantees or commitments at this time.  

In regards to the alternative as a crash consistent backup, yes it would be safe as long as the databases, WIJ, and journals are all included and have a consistent point-in-time snapshot.  The databases in the backup archive may be "corrupt", and not until after starting Caché for the WIJ and journals to be applied will it be physically accurate.  Just like you said - a crash consistent backup and the WIJ recovery is key to the successful recovery.  

I will post back if I hear of changes coming with VSS integration.

Thanks for the reply Mark, that confirms our understanding. Glad we're not the only people asking for VSS support!

Hi all,

Please note that these scripts are also usable with IRIS.  In each of the 'pre' and 'post' scripts you only need to change each of the "csession <CACHE INSTANCE> ..." references to "iris <IRIS INSTANCE> ..."

Regards,
Mark B-