Replies by Mark Bolinsky for InterSystems Developer Community

Mark Bolinsky · May 10, 2024

Transparent HugePages (THP) are not the same as standard HugePages, and this is especially important for IRIS and its shared memory segment. THP do not handle shared memory segments. Please see my article that discusses this in detail: https://community.intersystems.com/post/linux-transparent-hugepages-and-impact-intersystems-iris

go to post

Mark Bolinsky · Jan 18, 2023

Message delivery is dependent on queuing, so if messages on the compute node(s) haven't completed through the production - especially on the outbound send to the Message Bank. The Message Bank will only "bank" those messages that have been actually sent to it. If there are still messages queued in the production, they will remain there in the production queues until those pods are restarted or PreStop hooks are used to allow the POD to have a grace period on container shutdown until all queues are empty. An Interoperability Production is a stateful set, and the queues are required to support message delivery guarantees.

go to post

Mark Bolinsky · Jan 19, 2022

Hi David,

We make available a large number of metrics from within the IRIS instance via REST API. The REST APIs can be used to integrate with Azure Monitor or any other 3rd party monitoring solution that supports REST. The exact metrics to use will be largely dependent on your application along with specific threshold values.

As a starting point, I would suggest the following as a minimum:

cpu_usage
db_freespace
db_latency
glo_ref_per_sec
glo_update_per_sec
jrn_block_per_sec
license_percent_used
phys_mem_percent_used
phys_reads_per_sec
phys_writes_per_sec
process_count
system_alerts_new
wd_cycle_time

The metrics collected are agnostic to running in Azure, AWS, or on-prem, so they are useful in any deployment scenario. Here's a link to the all the standard available metrics and their descriptions within IRIS:

https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GCM_rest#GCM_rest_metrics

You can also create application specific metrics. The details can be found here:

https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GCM_rest#GCM_rest_metrics_application

I hope this helps.

Thanks,
Mark B-

go to post

Mark Bolinsky · Apr 19, 2021

Hi Anzelem,

We have very strict Hardware Compatibility List (HCL) for HealthShare and TrakCare that we provide for our preferred solutions based on vendor benchmark testing and live sites. We encourage all our HealthShare and TrakCare customers to stick to the HCL to ensure predictable performance and the high availability customers would expect from our software.

In regards to HPe Synergy 480 blades, these are just traditional blade servers and nothing really all that different as long as they are using one of our recommended processors and have the network/storage adapters to all-flash SAN storage. When getting into hyper converged infrastructure (HCI) solutions, the storage architecture and management is the key factor (potentially pain point) because some HCI solutions are OK and others not so much.

I hope this helps.

Regards,

Mark B-

go to post

Mark Bolinsky · Jul 9, 2020

Hi Eriks,

Specific to you questions about why you cannot achieve 200MB/s, there are some specific physics/physical reasons why this is the case. Firstly, your file copy is a completely different IO operation - it's performed at larger block size requests and 100% sequential in operation benefiting from file cache and/or storage controller cache along with NTFS read-ahead prediction.

In a Caché SQL query, Caché (or IRIS) will do 8KB block reads and presumably random in nature as well depending on the query and the data/global structure, so any caching will be mostly limited to whatever you have defined for database cache (global buffers) in the Caché instance. Since this is 5.0.21, I wouldn't expect your installation to have hundreds of GBs of global buffers (and I would not recommend that on 5.0.21 either), so you are at the mercy of disk latency of a single process doing random 8KB reads and not total throughput you see in a file copy operation.

So, based on ~20MB/sec you are seeing, this indicates you are getting about 2500 8KB IOPS or .4ms single process storage latency - this is actually very good performance for a single process. As you add more jobs in parallel you start approaching other limits in the IO chain such as SCSI queue depths at the VM layer, at the VMware ESXi layer, etc... and its more a IO operation limitation than a throughput (MB/s) limitation.

I hope this helps explain the situation you are seeing, and expected behavior because the ~20MB/s you see is just a factor of storage latency for a single process (.4ms) so that's a max IOPS per second (~2500) * 8KB IO size = ~20MB/sec

Kind regards,

Mark B-

go to post

Mark Bolinsky · Nov 6, 2019

Using Veeam backup/snapshot is very common with Caché and IRIS, and when using the snapshot process there are a couple things to be aware of:

1. Make sure you are NOT including the VM's memory state as this will have a long impact to VM stun times.

2. Make sure you are current with VMware vSphere patches as there are some known issues with snapshot performance and data consistency in older versions of vSphere. I would recommend being on at least vSphere 6.7 or above.

3. You need to make sure your journal disk is on a different VMDK than any of your CACHE.DATs and CACHE.WIJ especially after you the thaw the instance because a large burst of writes may happen and cause IO to flood/serialize the device and potentially block or slow down journal writes (...and triggers a premature mirror failover because of it).

4. You definitely need to use the ExternalFreeze/Thaw APIs to ensure the CACHE.DATs within the snapshot are "clean".

5. Confirm your current Q0S timeout value as some earlier versions of Caché had a very low QoS value and with snapshots I believe it should be 8 set to seconds and not to exceed 30 seconds.

Also the links that Peter mentioned are very good links to reference as well for more details.

go to post

Mark Bolinsky · Sep 20, 2018

Hello Ashish,

Great question. Yes, NetBackup is widely used by many of our customers, and the approach of using the ExternalFreeze/Thaw APIs is the best approach. Also with you environment being on VMware ESXi 6, we also can support using VMDK snapshots as part of the backup process assume you have the feature in NetBackup 8.1 to support VMware guest snapshots. I found the following link from NetBackup 8.1 and their support in a Vmware environment: https://www.veritas.com/content/support/en_US/doc/NB_70_80_VE

You will want to have pre/post scripts added to the NetBackup backup job so that the database is frozen prior to taking the snapshot, and then thawed right after the snapshot. Then NetBackup will take a clean backup of the VMDKs providing an application consistent backup. Here is another link to an article on using ExternalFreeze/Thaw in a VMware environment: https://community.intersystems.com/post/intersystems-data-platforms-and-performance-–-vm-backups-and-caché-freezethaw-scripts

I hope this helps. Please let me know if you have any questions.

Regards,

Mark B-

go to post

Mark Bolinsky · Aug 15, 2018

Hi Jason,

Thank you for your post. We provide a storage performance utility called RANREAD. This will actually use HealthShare/Ensemble (also Caché and InterSystems IRIS) to generate the workload rather than relying on an external tool to trying to simulate what HealthShare/Ensemble might be. You can find the details here in this community article here.

Kind regards,

Mark B-

go to post

Mark Bolinsky · Oct 25, 2017

Hi Raymond,

Thank you for your question. I can help with your question. We have done a lot of testing with EC2, and the performance of an EC2 instance will vary based on an on-demand or reserved instances even of the same EC2 instance type. In AWS a given EC2 instance type's reported number of vCPU is an individual thread on the processor as a "logical processor". The OS (and Ensemble/HealthShare as well for that matter) will only see a given instance's number of vCPUs, and the OS will only schedule jobs on those as it sees them. Ensemble and HealthShare are process based - not thread based, so for an instance type of m4.large with 4 vCPUs will mean only 4 jobs in parallel will execute as a time.

In your specific case with the large amount of XSLT parsing and adjusting pool sizes, you will want to first determine if FIFO is a requirement, if so, then unfortunately you need to remain at a pool size of 1 to ensure FIFO. However, if FIFO is not required in your production or a given Business Service/Process/Operation, you can adjust the pool sizes to values higher than 1 to manage the message queues. Having a large pool size won't impact the performance or a single XSTL parse, however it will allow for more parallel messages and XSLT parsing. If you see CPU utilization at 100% and the message queues continual grow, you may need a large EC2 instance type (and larger pool size) to accommodate the message rates.

I hope this helps.

Kind regards,

Mark B-

go to post

Mark Bolinsky · Mar 22, 2017

Hi Mack,

I can help here. The VMS/Itanium system you are migrating from is quite old, and has quite slow processors. For something like this you can figure at least 4 of the McKinley cores (maybe more) to 1 single current model Intel Xeon E5 v4 series core. I would look to using a server such as a single-socket system with an Intel Xeon E5-2667v4 processor and 64GB of RAM (more RAM doesn't hurt either). The E5-2667v4 processor is a 8-core processor @ 3.2Ghz each which is far more CPU than you would need, however it's actually quite difficult to get a smaller server theses.

For a workload like this, a virtual machine in vSphere, Hyper-V, or KVM would probably be more appropriate.

Also, I have a few comments on your current Caché configuration:

The amount of routine buffers configured you have configured (3584MB) exceeds the maximum allowed (max is only 1023MB). You can confirm in your cconsole.log that startup actually reduced to the max value. You will want to update your routine cache size to 1023MB so that it takes effect on the next Caché restart.
I see you have 512MB of 2KB database buffers allocated and 43496MB of 8KB buffers. I would suggest removing the allocation of the 2KB buffers completely and just allow any 2KB databases you have to use the 8KB buffers. That way you aren't artificially capping your database cache.
Speaking of 2KB databases, If you still actually have 2KB databases on your system, it is highly recommended to convert those to 8KB databases for data safety and performance reasons.

Kind regards,

Mark B-

go to post

Mark Bolinsky · Oct 25, 2016

Hi Anzelem,

Here are the steps that need to be defined in your VCS cluster resource group with dependencies.

Remount the storage <— this is not new
Relocate the cluster IP <— this is not new
Simple VCS application/script agent to restart the ISC Agent < — THIS IS NEW
ISC VCS cluster agent to start Caché < — this is not new (make the previous step a dependency before executing)

The script to start the ISCAgent would be dependent on the storage being mounted in the first step.

This should provide you with the full automation needed here. Let me know if there any any concerns or problems with the above steps.

Regards,

Mark B-

go to post

Mark Bolinsky · May 31, 2016

Hi Steve,

There are multiple ways to accomplish this and really depends on the security policies of a given organization. You can do as you have outlined in the original post, you can do as Dmitry has suggested, or you can even take it a step further and provide an external facing DMZ (eDMZ) and an internal DMZ (iDMZ). The eDMZ contains only the load balancer with firewall rules only allowing HTTPS access to load balance to only the web servers in the iDMZ, and then the iDMZ has firewall rules to only allow TLS connections to the super server ports on the APP servers behind all firewalls.

Here is a sample diagram describing the eDMZ/iDMZ/Internal network layout.

So, as you can see there are many ways this can be done, and the manner in which to provide network security is up to the organization. It's good to point out that InterSystems technologies can support many different methodologies of network security from the most simple to very complex designs depending on what the application and organization would require.

Kind Regards,

Mark B

go to post

Mark Bolinsky · May 5, 2016

Hi all, I'd like to offer some input here. Ensemble workloads are traditionally mostly updates when used as purely message ingestion, some transformations, and outbound to one or more outbound interfaces. As a result, expect to see low Physical Reads rates (as reported in ^mgstat or ^GLOSTAT), however if there are additional workloads such as reporting or applications built along with the Ensemble productions they may have do a higher rate of physical reads.

As a general rule to size memory for Ensemble we use 4GB of RAM for each CPU (physical or virtual CPU) and then use 50-75% of that RAM for global buffers. So in a 4 core system, the recommendation is 16GB of RAM with 8-12GB allocated to the global buffers. This would leave 4-8GB for OS kernel and Ensemble processes. When using very large memory configurations (>64GB), using the 75% rule rather than only 50% is ideal because the OS kernel and processes won't need so much memory.

One additional note is we highly recommend the use of huge_pages (Linux) or Large_pages (Windows) to provide a much more efficient memory management.

go to post

Mark Bolinsky · Apr 10, 2016

Hello,

I cannot name specific customers, however this is a configuration used with TrakCare and TrakCare Lab deployments (prior to TrakCare Lab Enterprise which now integrates lab directly as a module into a single TrakCare instance), where each the TrakCare and TrakCare Lab are separate failover mirror sets and TrakCare Analytics is defined as a single Reporting Async mirror member to be the source data to build/support the TrakCare Analytics DeepSee cubes and dashboards in a single instance.

This is our standard architecture for TrakCare based deployments. I hope this helps. Please let me know if there are specific questions or concerns with this deployment model.

King regards,

Mark B-

go to post

Mark Bolinsky · Mar 8, 2016

Yes. Database mirroring within cloud infrastructure is possible. As you point out the use of the virtual IP address (VIP) in most cases is not doable. This is due to cloud network management/assignments/rules not particularly liking having IP addresses changing outside of the cloud management facilities.

Having said that, the use of 3rd party load balancers offers a solution in the form of a virtual appliance available in most cloud marketplaces in a Bring-Your-Own-License (BYOL) model. As an example F5 LTM Virtual Edition. With these appliances there are usually two methods available to control network traffic flow.

The first option uses an API called from ^ZMIRROR during failover to instruct the load balancer that a particular server is now the primary mirror member. The API methods range from CLI type scripting to REST API integration.

The second option uses load balancer polling to determine which mirror member is primary. This involves creating a simple CSP page or listening socket to respond whether a given server in the load balanced pool is the primary mirror member.

The second option is more portable and load balancer agnostic since it doesn't rely on specific syntax or integration methods from a given load balancer vendor or model. However the limitation is the frequency of polling. In most cases polling can be as low as a few seconds - which in most scenarios is acceptable.

I will be soon posting a long article here on the Community detailing some examples using F5 LTM VE and providing a sample CSP status page and REST API integration to cover both options mentioned above. I will also be presenting a session during our upcoming Global Summit.