Hi David,

We make available a large number of metrics from within the IRIS instance via REST API.  The REST APIs can be used to integrate with Azure Monitor or any other 3rd party monitoring solution that supports REST.  The exact metrics to use will be largely dependent on your application along with specific threshold values.  

As a starting point, I would suggest the following as a minimum:  

  • cpu_usage
  • db_freespace
  • db_latency
  • glo_ref_per_sec
  • glo_update_per_sec
  • jrn_block_per_sec
  • license_percent_used
  • phys_mem_percent_used
  • phys_reads_per_sec
  • phys_writes_per_sec
  • process_count
  • system_alerts_new
  • wd_cycle_time

The metrics collected are agnostic to running in Azure, AWS, or on-prem, so they are useful in any deployment scenario.  Here's a link to the all the standard available metrics and their descriptions within IRIS:

https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GCM_rest#GCM_rest_metrics

You can also create application specific metrics.  The details can be found here:

https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GCM_rest#GCM_rest_metrics_application

I hope this helps.  

Thanks,
Mark B-

Hi Eriks,

Specific to you questions about why you cannot achieve 200MB/s, there are some specific physics/physical reasons why this is the case.  Firstly, your file copy is a completely different IO operation - it's performed at larger block size requests and 100% sequential in operation benefiting from file cache and/or storage controller cache along with NTFS read-ahead prediction.  

In a Caché SQL query, Caché (or IRIS) will do 8KB block reads and presumably random in nature as well depending on the query and the data/global structure, so any caching will be mostly limited to whatever you have defined for database cache (global buffers) in the Caché instance.  Since this is 5.0.21, I wouldn't expect your installation to have hundreds of GBs of global buffers (and I would not recommend that on 5.0.21 either), so you are at the mercy of disk latency of a single process doing random 8KB reads and not total throughput you see in a file copy operation.  

So, based on ~20MB/sec you are seeing, this indicates you are getting about 2500 8KB IOPS or .4ms single process storage latency - this is actually very good performance for a single process.  As you add more jobs in parallel you start approaching other limits in the IO chain such as SCSI queue depths at the VM layer, at the VMware ESXi layer, etc... and its more a IO operation limitation than a throughput (MB/s) limitation.

I hope this helps explain the situation you are seeing, and expected behavior because the ~20MB/s you see is just a factor of storage latency for a single process (.4ms) so that's a max IOPS per second (~2500) * 8KB IO size = ~20MB/sec

Kind regards,

Mark B-

Hi Jason,

Thank you for your post.  We provide a storage performance utility called RANREAD.  This will actually use HealthShare/Ensemble (also Caché and InterSystems IRIS) to generate the workload rather than relying on an external tool to trying to simulate what HealthShare/Ensemble might be.  You can find the details here in this community article here

Kind regards,

Mark B-

Hi Mack,

I can help here.  The VMS/Itanium system you are migrating from is quite old, and has quite slow processors.  For something like this you can figure at least 4 of the McKinley cores (maybe more) to 1 single current model Intel Xeon E5 v4 series core.  I would look to using a server such as a single-socket system with an Intel Xeon E5-2667v4 processor and 64GB of RAM (more RAM doesn't hurt either).  The E5-2667v4 processor is a 8-core processor @ 3.2Ghz each which is far more CPU than you would need, however it's actually quite difficult to get a smaller server theses.  

For a workload like this, a virtual machine in vSphere, Hyper-V, or KVM would probably be more appropriate.

Also, I have a few comments on your current Caché configuration:

  • The amount of routine buffers configured you have configured (3584MB) exceeds the maximum allowed (max is only 1023MB).  You can confirm in your cconsole.log that startup actually reduced to the max value.  You will want to update your routine cache size to 1023MB so that it takes effect on the next Caché restart.
  • I see you have 512MB of 2KB database buffers allocated and 43496MB of 8KB buffers.  I would suggest removing the allocation of the 2KB buffers completely and just allow any 2KB databases you have to use the 8KB buffers.  That way you aren't artificially capping your database cache.
  • Speaking of 2KB databases, If you still actually have 2KB databases on your system, it is highly recommended to convert those to 8KB databases for data safety and performance reasons.  

Kind regards,

Mark B-

Hi Steve,

There are multiple ways to accomplish this and really depends on the security policies of a given organization.  You can do as you have outlined in the original post, you can do as Dmitry has suggested, or you can even take it a step further and provide an external facing DMZ (eDMZ) and an internal DMZ (iDMZ).  The eDMZ contains only the load balancer with firewall rules only allowing HTTPS access to load balance to only the web servers in the iDMZ, and then the iDMZ has firewall rules to only allow TLS connections to the super server ports on the APP servers behind all firewalls.

Here is a sample diagram describing the eDMZ/iDMZ/Internal network layout.

So, as you can see there are many ways this can be done, and the manner in which to provide network security is up to the organization.  It's good to point out that InterSystems technologies can support many different methodologies of network security from the most simple to very complex designs depending on what the application and organization would require.

Kind Regards,

Mark B