#High Availability

5 Followers · 53 Posts

High availability (HA) refers to the goal of keeping a system or application operational and available to users a very high percentage of the time, minimizing both planned and unplanned downtime.

Documentation.

Article Mark Bolinsky · Jul 1, 2016 17m read

++Update: August 2, 2018

This article provides a reference architecture as a sample for providing robust performing and highly available applications based on InterSystems Technologies that are applicable to Caché, Ensemble, HealthShare, TrakCare, and associated embedded technologies such as DeepSee, iKnow, Zen and Zen Mojo.

Azure has two different deployment models for creating and working with resources: Azure Classic and Azure Resource Manager. The information detailed in this article is based on the Azure Resource Manager model (ARM).

4
0 12708
Article Bob Binstock · Sep 6, 2016 19m read

Mirroring 101

Caché mirroring is a reliable, inexpensive, and easy to implement high availability and disaster recovery solution for Caché and Ensemble-based applications. Mirroring provides automatic failover under a broad range of planned and unplanned outage scenarios, with application recovery time typically limited to seconds. Logical data replication eliminates storage as a single point of failure and a source of data corruption. Upgrades can be executed with little or no downtime.

22
3 7935
Article Mark Bolinsky · Feb 12, 2019 32m read

The Amazon Web Services (AWS) Cloud provides a broad set of infrastructure services, such as compute resources, storage options, and networking that are delivered as a utility: on-demand, available in seconds, with pay-as-you-go pricing. New services can be provisioned quickly, without upfront capital expense. This allows enterprises, start-ups, small and medium-sized businesses, and customers in the public sector to access the building blocks they need to respond quickly to changing business requirements.

Updated: 10-Jan, 2023 

3
12 7192
Article Mark Bolinsky · Mar 18, 2016 9m read

++ Update: August 1, 2018

The use of the InterSystems Virtual IP (VIP) address built-in to Caché database mirroring has certain limitations. In particular, it can only be used when mirror members reside the same network subnet. When multiple data centers are used, network subnets are not often “stretched” beyond the physical data center due to added network complexity (more detailed discussion here). For similar reasons, Virtual IP is often not usable when the database is hosted in the cloud.

Network traffic management appliances such as load balancers (physical or virtual) can be used to achieve the same level of transparency, presenting a single address to the client applications or devices. The network traffic manager automatically redirects clients to the current mirror primary’s real IP address. The automation is intended to meet the needs of both HA failover and DR promotion following a disaster. 

12
6 7032
Article Mark Bolinsky · Oct 12, 2018 31m read

Google Cloud Platform (GCP) provides a feature rich environment for Infrastructure-as-a-Service (IaaS) as a cloud offering fully capable of supporting all of InterSystems products including the latest InterSystems IRIS Data Platform. Care must be taken, as with any platform or deployment model, to ensure all aspects of an environment are considered such as performance, availability, operations, and management procedures.  Specifics of each of those areas will be covered in this article.

0
3 4800
Article Anton Umnikov · Jan 21, 2021 26m read

In this article, we’ll build a highly available IRIS configuration using Kubernetes Deployments with distributed persistent storage instead of the “traditional” IRIS mirror pair. This deployment would be able to tolerate infrastructure-related failures, such as node, storage and Availability Zone failures. The described approach greatly reduces the complexity of the deployment at the expense of slightly extended RTO.

16
8 4056
Article Mark Bolinsky · Jul 10, 2018 4m read

Often InterSystems technology architect team is asked about recommended storage arrays or storage technologies.  To provide this information to a wider audience as reference, a new series is started to provide some of the results we have encountered with various storage technologies.  As a general recommendation, all-flash storage is highly recommended with all InterSystems products to provide the lowest latency and predictable IOPS capabilities.

The first in the series was the most recently tested Netapp AFF A300 storage array.  This is middle-tier type storage array with several higher models above it.  This specific A300 model is capable of supporting a minimal configuration of only a few drives to hundreds of drives per HA pair, and also capable of being clustered with multiple controller pairs for tens of PB's of disk capacity and hundreds of thousands of IOPS or higher. 

0
0 3557
Article Pete Greskoff · Jan 10, 2017 9m read

NB. Please be advised that PKI is not intended to produce certificates for secure production systems. You should make alternate arrangements to create certificates for your productions.
NB. PKI is deprecated as of IRIS 2024.1: documentation and announcement.

In this post, I am going to detail how to set up a mirror using SSL, including generating the certificates and keys via the Public Key Infrastructure built in to Caché. The goal of this is to take you from new installations to a working mirror with SSL, including a primary, backup, and DR async member, along with a mirrored database.

7
0 2794
Question Anzelem Sanyatwe · Oct 25, 2016

ISCAgent is automatically installed with Cache, runs as a service and can be configured to
start with the system. This is fine – but the complication comes when this is on VCS clusters with
Mirroring on. When installing a Single Instance of Cache in a Cluster, point number 2. Says “Create
a link from /usr/local/etc/cachesys to the shared disk. This forces the Caché registry and all
supporting files to be stored on the shared disk resource you have configured as part of the
service group.

15
0 1930
Article Pete Greskoff · Jun 27, 2018 8m read

NB. Please be advised that PKI is not intended to produce certificates for secure production systems. You should make alternate arrangements to create certificates for your productions.
NB. PKI is deprecated as of IRIS 2024.1: documentation and announcement.

In this post, I am going to detail how to set up a mirror using SSL, including generating the certificates and keys via the Public Key Infrastructure built in to InterSystems IRIS Data Platform. I did a similar post in the past for Caché, so feel free to check that out here if you are not running InterSystems IRIS. Much like the original, the goal of this is to take you from new installations to a working mirror with SSL, including a primary, backup, and DR async member, along with a mirrored database. I will not go into security recommendations or restricting access to the files. This is meant to just simply get a mirror up and running. Example screenshots are taken on a 2018.1.1 version of IRIS, so yours may look slightly different.

3
4 1862
Article Anton Umnikov · May 31, 2021 6m read

All source code to the article is available at: https://github.com/antonum/ha-iris-k8s 

In the previous article, we discussed how to set up IRIS on k8s cluster with high availability, based on the distributed storage, instead of traditional mirroring. As an example, that article used the Azure AKS cluster. In this one, we'll continue to explore highly available configurations on k8s. This time, based on Amazon EKS (AWS managed Kubernetes service) and would include an option for doing database backup and restore, based on Kubernetes Snapshot.

Installation

Let's get right to business.

2
0 1605
Article Bob Binstock · Sep 7, 2016 6m read

Mirror Outage Procedures

Caché mirroring is a reliable, inexpensive and easy to implement high availability and disaster recovery solution for Caché and Ensemble-based applications. This article provides an overview of recommended procedures for dealing with a variety of planned and unplanned mirror outage scenarios. (For detailed information about mirroring and a wide range of mirror-related procedures, see Mirroring 101.)

A Caché mirror typically consists of two Caché instances on physically independent hosts, called failover members

0
1 1245
Article Luis Angel Pérez Ramos · Apr 25, 2023 12m read

A common need for our customers is to configure both HealthShare HealthConnect and IRIS in high availability mode.

It's common for other integration engines on the market to be advertised as having "high availability" configurations, but that's not really true. In general, these solutions work with external databases and therefore, if these are not configured in high availability, when a database crash occurs or the connection to it is lost, the entire integration tool it becomes unusable.

4
4 1039
Article Oliver Wilms · Aug 4, 2021 3m read

I have been working on redesigning a Health Connect production which runs on a mirrored instance of Healthshare 2019. We were told to take advantage of containers. We got to work on IRIS 2020.1 and split the database part from the Interoperability part. We had the IRIS mirror running on EC2 instances and used containers to run IRIS interoperability application. Eventually we decided to run the data tier in containers as well. Later we switched from using EC2 instances to Fargate “server-less” compute.

14
2 1027
Question Paul Coviello · Nov 16, 2016

looking for what the timeout value is for the connection for shadowing. and if it is known the timeout for mirroring also.

I'm currently getting this error and when we go to mirroring it would be good to know also, this way we can go back to our network vendor and give them as much info as possible.

10/19/16-12:19:40:696 (d30b6) 2 [SHADOWING] SHADOW SERVER (PRODSHAD): <ZREAD>ReadNone+3^SHDWUTIL;ERROR #1071: TCP read timed out - remote server is not responding (repeated 16 times)

thanks

Paul 

1
0 990
Question Scott Roth · Nov 13, 2018

We are trying to script a High Availability Shutdown/Start script in case we need to fail over to one of our other servers we can be back up within mins. Is there a way to configure the startup procedure to Automatically Stop/Start the JDBC server when shutting down or starting up cache? is there an auto setting we can change?

Thanks

Scott Roth

The Ohio State University Wexner Medical Center

10
0 891
Article Roy Leonov · Mar 12, 2024 5m read

As an IT and cloud team manager with 18 years of experience with InterSystems technologies, I recently led our team in the transformation of our traditional on-premises ERP system to a cloud-based solution. We embarked on deploying InterSystems IRIS within a Kubernetes environment on AWS EKS, aiming to achieve a scalable, performant, and secure system. Central to this endeavor was the utilization of the AWS Application Load Balancer (ALB) as our ingress controller.

3
9 797
Question Jon Astle · Nov 9, 2017

I have Ensemble/Healthshare running in a production environment which is setup with a mirror failover and an arbiter sitting between them.

In the event of a failover we have a number of connections that need stopping/monitoring and starting in a certain order.

Is there a programmatic way we can detect the failover and stop certain services and operations immediately and then start them up again in the required order, checking their connection state before starting the next connection.

I am thinking Ens.Director is probably what I need however I need some guidance on how to implement a solution.

2
0 644
Article Myles Collins · Jul 22, 2025 6m read

Are you familiar with SQL databases, but not familiar with IRIS?  Then read on...

About a year ago I joined InterSystems, and that is how IRIS got on my radar.  I've been using databases for over 40 years—much of that time for database vendors—and assumed IRIS would be largely the same as the other databases I knew.

4
3 611
Article Ariel Glikman · Jan 13, 2025 3m read

You may have noticed that to configure a mirror for InterSystems IRIS for Health and HealthShare® Health Connect there is a special requirement. I wanted to go through it step by step in this article.

This assumes you have already configured the second failover member and confirmed a successful failover member status in the mirror monitor:

Step 1: Enable HS_Services user (on backup and primary

Step 2: Switch to Namespace HSSYS and go to Interoperability > Configure > Credentials.

5
6 606
Question Gerry Connolly · Oct 31, 2017

Is it possible to make the cache terminal available over a mirrored vip address for a healthshare mirrored environment? So that connecting to a terminal for a mirrored environment will always connect to the Live Node?

I'm looking to write a Powershell script to run against the system and need to connect to the Live Node in a mirrored setup. Is this possible or am I going to have to log onto each node to establish which is Live.  Or does this even matter?

4
0 584
Article Alex Woodhead · Jan 28, 2023 3m read

Some Usage cases

1. A deployment may consist of two high availability instances and two disaster recovery instances in a different data center.

The corresponding UAT environment could replicate this giving a total of 8 instances. How do you confirm CPF and Scheduled task alignment across ALL instances.

2. Another team (possibly in anther organization) makes changes to an IRIS instance to correct a problem, improve security, or modify shared system task configuration. Capture the CPF before and after to see what was done across instances.

2
0 555
Discussion David Little · May 16, 2017

Hi All

I'm looking for some field experiences, lessons learned, or actual deployed solutions to the problem of replicating non-CACHE.DAT data in a mirrored Cache environment.  

Environment:

  • Operating System is IBM AIX
  • Hardware is IBM Power w/ virtualised storage
  • Cache 2017.1
  • 2x Production mirror members
  • 1x DR mirror member

There are a number of different data types that we're looking at mirroring, and a number of identified solutions.

1
0 524
Article Oliver Wilms · Aug 22, 2021 2m read

I have described my efforts to optimize IRIS Mirror deployment in AWS ElasticContainer Service (ECS) in my prior article.

IRIS Mirror in the cloud (AWS) | InterSystems Developer Community | AWS
 

I have come to the opinion that IRIS Mirror is not as reliable as needed when deployed in ECS. The root of the problem is the fact that ECS randomly assigns one of the available IP addresses to each EC2 host or Fargate task it starts.

These get stored in iris.cpf file in MapMirrors section as shown here:

[MapMirrors.IRISMIRROR]

FAILOVER1=10.2ab.1cd.146,2188,,10.2ab.1cd.

0
0 485
Question Scott Roth · Mar 10, 2023

I am looking into creating a ZSTOP as you probably have seen from my previous posts, is there a way to capture the type of shutdown that occurred? So say if there was an unknown hardware failure (forced), vs a user shutdown? Mainly looking for user or system shutdown when we force another destination to become the primary in the mirror. So if a user shutdown the production to do.,... Task A, Task B etc..

Thanks

Scott 

3
0 485
Question Scott Roth · Feb 21, 2023

I am new to setting up a mirror environment....

We will have a Arbiter, Two Failover members (A,B), and a Async (DR) member (C). I have the two failover members in sync and are configured for Arbiter Control.

My question is about the Async member, when I initially set it up I pointed it to the mirror on the primary node A.

Is that correct?

What happens to the Async member (C) if we fail over the mirror to the secondary member B?

8
0 462