ISCAgent Auto-Startup in a Mirror Configuration on a Veritas Cluster

ISCAgent is automatically installed with Cache, runs as a service and can be configured to
start with the system. This is fine – but the complication comes when this is on VCS clusters with
Mirroring on. When installing a Single Instance of Cache in a Cluster, point number 2. Says “Create
a link from /usr/local/etc/cachesys to the shared disk. This forces the Caché registry and all
supporting files to be stored on the shared disk resource you have configured as part of the
service group.”

So on the second passive node – this statement makes ISCAgent startup with the system, nor manually
starting it impossible as all the binaries will be on the primary node shared disk. Up until Cache
fail-over, then you are able to start it up when binaries are now local. So the issue is that in a
disaster failover scenario Cache automatically starts on the second node without ISCAgent and
throws all sorts of errors, locks DB and all connection with sign-on inhibited – up until someone manually connects and start it up. The ideal scenario
I am looking at is a mechanism to also automatically start ISCAgent during the failover to the
secondary node.

What suggestions are out there to make ISCAgent start up automatically on second node when the cluster fail over in a VCS configuration like this?

  • 0
  • 0
  • 425
  • 10
  • 4

Answers

Anzelem,

If your Mirror configuration requires special attention, you can use the ^ZMIRROR routine to run code at certain points in the Mirror Failover process.  It sounds as though you need to add code to start up the ISCAgent into $$CanNodeStartToBecomePrimary^ZMIRROR().   The latest Docs on ZMIRROR are below, but entry points such as this have existed in many versions of Caché.

 

http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=...

Hi Jeffrey;

Isn't that tool late, for this to be processed the ISCAgent needs to be up and running already.

common message in the console.log is this one - before any Mirror checks happens.

"Failed to verify Agent connection...(repeated 5 times"

Yes, that's correct. The ISCAgent should be running on the backup member at all times. The "Application Considerations" section of the Using Veritas Cluster Server for Linux with Caché appendix  of the High Availability Guide 
(http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=GHA_veritas_clusters#GHA_veritas_clusters_app_considerations) says:

  • If any Caché instance that is part of a failover cluster is to be added to a Caché mirror, the ISCAgent, which is installed with Caché, must be properly configured; see Configuring the ISCAgent in the “Mirroring” chapter of this guide for more information. For any node on which Caché was not installed as part of cluster setup (see Installing a Single Instance of Caché), install the ISCAgent on that node (see Installing the Arbiter in the “Mirroring” chapter for information about using a standalone ISCAgent install kit for this purpose) and then configure it.

We had a discussion about this some time back and concluded there is no reason not to have the agent running at all times on all mirror members regardless of their status in the cluster.

Anzelem, maybe I should move this up into the "Install a SIngle Instance of Caché" procedures in all the cluster appendixes?

Hi Bob;

I would like you to understand where the complication is coming from. It is actually a bit up in that page "Install a Single Instance of Cache", point number 2). Create a link from /usr/local/etc/cachesys to the shared disk. This forces the Caché registry and all supporting files to be stored on the shared disk resource you have configured as part of the service group. And they further suggest commands to run. 

Now because the default install directory is linked out, you can not install a standalone kit of ISCAgent on that second node because the cluster disks are not present. Typically you will get this:

[]# pwd

/usr/local/etc

[]# ls -al cachesys

lrwxrwxrwx. 1 root root 43 May 28  2015 cachesys -> /tcl_prod_db/labsys/usr/local/etc/cachesys/ (This path resides on a cluster disk).

[]# cd /usr/local/etc/cachesys

-bash: cd: /usr/local/etc/cachesys: No such file or directory

 

The default install directory of ISCAgent is the same as the path that is mapped out to shared cluster disks, hence the complication and why am reaching out.

I also agree that the ISCAgent can run on each node independently. There is no big reason for it's binaries to always follow the cluster resources all the time. 

I previously wrote this to WRC, and still waiting for it to be ratified, if this is a viable alternative.

 

""""

The one I’ve been thinking of all along which could be an easy way forward if it is possible you can re-package ISCAgent installer to install in a different directory instead of the default one. The default directory is the one giving us headaches as it is linked back to the cluster disk.

This I mean if I’m on the secondary node without the cluster disks, this is what you will encounter:

 

[]# pwd

/usr/local/etc

[]# ls -al cachesys

lrwxrwxrwx. 1 root root 43 May 28  2015 cachesys -> /tcl_prod_db/labsys/usr/local/etc/cachesys/ (This path resides on a cluster disk).

[]# cd /usr/local/etc/cachesys

-bash: cd: /usr/local/etc/cachesys: No such file or directory

 

So in this scenario I cannot install the ISCAgent independently in its default format as it will fail as above.

That link we cannot touch as that will break the Cluster FailOver.

 

So the modifications I’m talking about will be:

  1. to change the default directory by creating a new one to ‘/usr/local/etc/iscagent’
  2. Then modify the  etc/init.d/ISCAgent script on this line from AGENTDIR=${CACHESYS:-"/usr/local/etc/cachesys"}  to AGENTDIR=${CACHESYS:-"/usr/local/etc/iscagent"}

 

After the installation this seems achievable by doing this :-

  1. rsync -av /usr/local/etc/cachesys/* /usr/local/etc/iscagent/
  2. Then edit etc/init.d/ISCAgent as suggested on 2. Above

The issue I’ve with this is that there could be other references in the installer that I might not be aware of. If so, hence suggesting you guys re-package it with the modifications as suggested above.

 

This way we make ISCAgent independent and resides locally on the TWO nodes (primary and secondary failover node),  as it’s binaries don’t really need to follow the Cluster Resources all the time. This way we also make etc/init.d/ISCAgent start automatically with the OS.


"""'''

yes, now i see the problem.  i didn't realize there is no option for installing a standalone agent in a different directory; i've never done it, and i assumed since you were installing it on an system without Caché you would be able to choose the installation directory. (of course, if that were the case, it should have been noted in the ISCAgent section in the Mirroring chapter.) adding that as you suggest seems to make a lot of sense. but that would need to be validated by Tom Woodfin and the mirroring team.

have you filed a prodlog on this? 

Hi Bob;

Would appreciate if you can hook me up with Tom Woodfin and the mirroring team.  The calls to peruse through are 861211 which was a continuation from 854501  .There are all sorts of suggestions, but if this can be bounced back to be validated.

i have alerted Tom and Ray Fucillo to this post, Anzelem. hopefully they will respond soon.

This should be scripted to be done as part of the cluster failover process, much in the same way you previously had it configured to be scripted on system startup.

Hi Pete;

Unfortunately, ISCAgent is not part of cluster service groups. The ISC Veritas 'online' script only does the Cache portion of it.

Hi Anzelem,

Here are the steps that need to be defined in your VCS cluster resource group with dependencies.

  • Remount the storage <— this is not new
  • Relocate the cluster IP <— this is not new
  • Simple VCS application/script agent to restart the ISC Agent < — THIS IS NEW
  • ISC VCS cluster agent to start Caché < — this is not new (make the previous step a dependency before executing)

The script to start the ISCAgent would be dependent on the storage being mounted in the first step. 

This should provide you with the full automation needed here.  Let me know if there any any concerns or problems with the above steps.

Regards,

Mark B-

Hi Anzelem,

May I ask you a couple of questions on your DR solution?

Which node would take over on Primary failure: Cache Mirror Backup or VCS secondary if both are alive?

More general: what is the main reason of mixing 2 different DR approaches?

=Thanks

 

Dear Alexey;

We do not have 2 different DR approaches.

The mirror config is only with Primary (at Production Site) and DR async (at DR site) so two instances in total.

The Production site has two physical boxes in a Veritas Cluster config for H.A purposes. Should the first one have an issue, Cache fails over to the second node and still comes up as Primary.  Should these two nodes get tossed up, or we lose the entire Production site, then we promote the DR async instance. The same applies to the DR site. In this environment the decision to fail-over to DR is not an automatic process, it needs to be announced first.

Hi Mark;

I like, those are logical steps to follow. last time I checked you guys did not have a Veritas lab test environment to validate this, because the moment it becomes a cluster resource it will then need to conform to Veritas facets,  e.g start, monitor, offline, etc. My instance is only in Prod mode, we have little room to experiment with this. Hence, the other, easy, quick way was that suggestion to break out ISCAgent directory. I just tested the 'rsync' copy and the directory edit in the service script and it seems to start up well.