We make available a large number of metrics from within the IRIS instance via REST API. The REST APIs can be used to integrate with Azure Monitor or any other 3rd party monitoring solution that supports REST. The exact metrics to use will be largely dependent on your application along with specific threshold values.
As a starting point, I would suggest the following as a minimum:
The metrics collected are agnostic to running in Azure, AWS, or on-prem, so they are useful in any deployment scenario. Here's a link to the all the standard available metrics and their descriptions within IRIS:https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GCM_rest#GCM_rest_metrics
You can also create application specific metrics. The details can be found here:https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GCM_rest#GCM_rest_metrics_application
I hope this helps.
We have very strict Hardware Compatibility List (HCL) for HealthShare and TrakCare that we provide for our preferred solutions based on vendor benchmark testing and live sites. We encourage all our HealthShare and TrakCare customers to stick to the HCL to ensure predictable performance and the high availability customers would expect from our software.
In regards to HPe Synergy 480 blades, these are just traditional blade servers and nothing really all that different as long as they are using one of our recommended processors and have the network/storage adapters to all-flash SAN storage. When getting into hyper converged infrastructure (HCI) solutions, the storage architecture and management is the key factor (potentially pain point) because some HCI solutions are OK and others not so much.
I hope this helps.
Specific to you questions about why you cannot achieve 200MB/s, there are some specific physics/physical reasons why this is the case. Firstly, your file copy is a completely different IO operation - it's performed at larger block size requests and 100% sequential in operation benefiting from file cache and/or storage controller cache along with NTFS read-ahead prediction.
In a Caché SQL query, Caché (or IRIS) will do 8KB block reads and presumably random in nature as well depending on the query and the data/global structure, so any caching will be mostly limited to whatever you have defined for database cache (global buffers) in the Caché instance. Since this is 5.0.21, I wouldn't expect your installation to have hundreds of GBs of global buffers (and I would not recommend that on 5.0.21 either), so you are at the mercy of disk latency of a single process doing random 8KB reads and not total throughput you see in a file copy operation.
So, based on ~20MB/sec you are seeing, this indicates you are getting about 2500 8KB IOPS or .4ms single process storage latency - this is actually very good performance for a single process. As you add more jobs in parallel you start approaching other limits in the IO chain such as SCSI queue depths at the VM layer, at the VMware ESXi layer, etc... and its more a IO operation limitation than a throughput (MB/s) limitation.
I hope this helps explain the situation you are seeing, and expected behavior because the ~20MB/s you see is just a factor of storage latency for a single process (.4ms) so that's a max IOPS per second (~2500) * 8KB IO size = ~20MB/sec
Log in or create a new account to continue