My opinion: IRIS Mirror not as reliable as expected in AWS Elastic Container Service
I have described my efforts to optimize IRIS Mirror deployment in AWS ElasticContainer Service (ECS) in my prior article.
I have come to the opinion that IRIS Mirror is not as reliable as needed when deployed in ECS. The root of the problem is the fact that ECS randomly assigns one of the available IP addresses to each EC2 host or Fargate task it starts.
These get stored in iris.cpf file in MapMirrors section as shown here:
To enable IRIS Mirror Manager to communicate between failover members I first added code to ZSTU startup routine to update IP addresses when IRIS starts. I obtain the current IP addresses from files that are updated in container entrypoint script.
This worked until this happened:
I had two tasks running on ip.133 (failover2) and ip.168 (failover1)
I updated my code and proceeded to test it by stopping both tasks so ECS would start two new tasks.
The result was new task ip.168 using failover2 volume became Primary and new task ip.146 running on failover1 volume was Stopped (Mirror Status).
08/17/21-13:24:01:380 (849) 0 [Utility.Event] Instance 'IRIS' starting on node ip-10-2ab-1cd-146.us-gov-west-1.compute.internal by user irisuser
08/17/21-13:24:02:236 (849) 2 [Utility.Event] System appears to have failed over from node ip-10-2ab-1cd-168.us-gov-west-1.compute.internal
08/17/21-13:24:03:781 (857) 2 [Utility.Event] Mirroring not started, this instance appears to have been copied. See ^MIRROR
08/17/21-13:24:01:426 (855) 0 [Utility.Event] Instance 'IRIS' starting on node ip-10-2ab-1cd-168.us-gov-west-1.compute.internal by user irisuser
08/17/21-13:24:01:926 (855) 2 [Utility.Event] System appears to have failed over from node ip-10-2ab-1cd-133.us-gov-west-1.compute.internal
My last attempt to solve this problem was to have code in entrypoint script to update failover IP addresses in iris.cpf file before IRIS starts. Even with this I still see the “appears to have failed over from…” message.
08/21/21-03:53:09:791 (794) 2 [Utility.Event] System appears to have failed over from node ip-10-2ab-1cd-170.us-gov-west-1.compute.internal
If I cannot be assured that I will have one Primary and one Backup, I do not consider IRIS Mirror reliable, so maybe it is just IIS Mirror?