Replies

To add to this, given that you're concerned with security and want to use TLS 1.2, you should strongly consider upgrading, as 2012.1.2 has a number of security issues that have been fixed over the years.

I just want to note that 2015 kits are not available for download anymore, even for supported customers. If anyone has a specific need for a 2015 kit, please contact the WRC.

Are the two mirror members functioning as primary and backup? No members will even attempt to connect to the arbiter until you have a primary and an active backup (because that is the only situation where the arbiter is used). If you do have a primary and active backup, and they aren't connecting to the arbiter, I'd suggest contacting the WRC so that we can take a look at this with you.

Is the database journaled? Remote non-mirrored databases on mirrored ECP database servers can only be mounted read-write on the ECP application server (in this case the reporting async) if the database is NOT journaled on the database server. This is documented here: https://docs.intersystems.com/latest/csp/docbook/Doc.View.cls?KEY=GHA_mi...

"Select the database you want to access over this ECP channel from the list of remote databases. You can select both mirrored databases (databases listed as :mirror:mirror_name:mirror_DB_name) and nonmirrored databases (databases listed as :ds:DB_name); only mirrored databases remain accessible to the application server in the event of mirror failover. When the data server is a failover member, mirrored databases are added as read-write, and nonmirrored databases are added as read-only, if journaled, or read-write, if not journaled; when the data server is a DR async member, all databases are added as read-only."

I don't know the answer to this, but since you haven't gotten any responses here, I'd suggest opening a case with the WRC so someone can investigate it fully. Please include the $zv string from both instances involved, as well as the location of the installation (as I suspect that is relevant to the problem you're seeing).

This isn't enough information to answer definitively, as we don't really know whether there is a limitation on the client side or the server side. You should check IRIS' messages.log file. My best guess is that the errors may be due to running out of licenses. Either way, I think it's worth engaging the WRC on this, as a lot of setup-specific information is going to be required to get to the bottom of it.

That routine ultimately uses the same API under the covers: ##class(SYS.Mirror).BecomePrimary(), which forces down the original primary.

I just want to note that, although this script works at the basic level, using systemd with Caché/IRIS is not going to be a perfect experience. systemd relies on the fact that it's the only method being used for stopping and starting the service. If the instance is stopped or started via another method, such as a direct 'iris start/stop/force' by a user, or an 'iris force' invoked by the ISCAgent during a mirror failover, systemd will lose track of the actual status of the instance. Certainly, if you want to use this on a test system for basic functionality, you can, but I would definitely not recommend this on a live system.

This is what reporting async members are for. You can run an archiving task on the async to move the data to another database (or different globals in the same database, however you want to do it), then purge the data on the primary. As long as the reporting process knows where to look for the data, you're all set.

Your real problem here is the memory usage on the system. It may or may not be Caché using up all the memory, and that's where your investigation should focus, but I wanted to give a technical explanation here for why the write daemon specifically is getting killed. 

Most of the memory used by Caché is allocated at instance startup, and is a 'shared memory segment', which you can see with 'ipcs'. Other (Caché and non-Caché) processes allocate memory for individual processing, but the vast majority of memory used by Caché on a running system is this shared memory segment. The largest chunk of that shared memory segment is almost always global buffers (where database blocks are stored for access by Caché processes). Anytime a database block is updated, it is updated in global buffers, and the write daemon will need to access that block in memory and write it to disk. Therefore, the write daemon ends up touching a huge amount of memory on a system, although almost all of that memory is shared. The Linux out of memory killer doesn't prioritize processes using individual memory vs. accessing shared memory segments, so the write daemon is almost always its first target (as it has accessed the most memory), even though killing that process doesn't actually free up much memory for the system (since that shared memory segment doesn't get freed until all other Caché processes detach from it).