IRIS mirror - arbiter status
Hi,
I have a IRIS mirror set
. all running on SUSE Linux
. each on a separate server
consisting of
Primary
Backup
Arbiter
My arbiter is the ISCagent.
Question
=======
Is there a Linux command showing me the status of the ISCagent as arbiter?
The systemctl service command below shows if the service is running but not if it is acting correctly as an arbiter.
I'm looking for a command like iris list which you can run on the Primary or Backup
===
systemctl list-units --type=service|grep -i isc
ISCAgent.service loaded active running InterSystems Agent
===
This is driven by a mirror member that points to an arbiter, so members need to be queried as to who their arbiter is. The arbiter itself is fairly passive.
Is there a way that we could get the status of the mirror from the arbiter, so it can kick of failover scripts to move things to our failover node that is non IRIS related?
What would be the benefit of querying the arbiter (I'm not sure this is possible) for mirror status vs querying one of the failover members? I also wasn't sure what you meant by non IRIS related.
We wanted to know if we could get the status from the Arbiter to tell a shell script at the OS level to kick off... to failover the VIP, and additional directories at the server level that is not part of the IRIS mirror.
I'm not sure I totally follow your architecture and why looking at the arbiter vs a failover member would be better. It would probably be good to discuss the details with your sales engineer (I'm guessing you're already speaking to them about this). Is the VIP not being handled automatically by the mirror? I saw you commented on some ZMIRROR related posts; I think that would be a good way for your mirror to kick off those non-IRIS failover items automatically on failover, rather than needing to poll.
Do you know where I can get a sample of ZMIRROR to know how to code it?
I'm not aware of a sample in our modern documentation but I've put in a request for that. Since pretty much any arbitrary code can be run in ZMIRROR, I don't think there's much guideline for coding ZMIRROR besides the documented tags.
I do think that something like the stubs we document for ZSTART/ZSTOP might be helpful, if that's what you had in mind.
Thanks.
The reason I would like to get the status of the IRIS arbiter process is that I had IRIS DB "freeze" recently.
On checking the logs on Primary and Backup nodes both were indicating that it lost connection to the artibiter. Both Primary and Backup nodes went into a "waiting" state so freezing the application running on IRIS.
I would encourage you to review the loss of connectivity scenarios covered in our documentation. The arbiter disconnecting from both sides is not enough to trigger a hang.
https://docs.intersystems.com/irisforhealth20223/csp/docbook/Doc.View.cls?KEY=GHA_mirror_set
A hang suggests a total loss of network connection between failover members and the arbiter simultaneously. In general, the arbiter is a supplement to the communication between the two failover members, and not necessarily the first place to look for an issue.
ex. If the arbiter connections to the failover members were first lost, the primary and failover member would remain in contact via their own ISCAgents. Then, if the two members lost contact with each other, the primary would have continued to operate.
Agreed contact WRC on this and get your messages log from Primary and backup (from during the outage) reviewed. This sounds like a full loss of connectivity or just a hunch are both mirror members backed up at the same time and does this coincide with the outage?
I based my view on the logs I got from Node 1, Node 2 & arbiter and the timestamps.
These are some samples:
Arbiter:
2023-01-17T15:54:56.773722+11:00 LIVEARB ISCAgent[16131]: Arbiter client error: Message read failed.
Node 1:
01/17/23-15:56:22:875 (18886) 2 [Utility.Event] Arbiter connection lost
01/17/23-15:56:23:663 (24774) 0 [Generic.Event] MirrorServer: Received new failover mode (Agent Controlled) from backup...(repeated 1 times)
Node 2:
01/17/23-15:56:23:407 (9272) 0 [Generic.Event] MirrorClient: Switched from Arbiter Controlled to Agent Controlled failover on request from primary