Mark Bolinsky · Aug 13, 2024 go to post

Great article!  I use your AWS and Azure examples all the time with HealthShare and TrakCare deployments!!!  This GCP example will surely be beneficial as well!

Mark Bolinsky · Jan 18, 2023 go to post

Message delivery is dependent on queuing, so if messages on the compute node(s) haven't completed through the production - especially on the outbound send to the Message Bank.  The Message Bank will only "bank" those messages that have been actually sent to it.  If there are still messages queued in the production, they will remain there in the production queues until those pods are restarted or PreStop hooks are used to allow the POD to have a grace period on container shutdown until all queues are empty.  An Interoperability Production is a stateful set, and the queues are required to support message delivery guarantees.  

Mark Bolinsky · Jan 19, 2022 go to post

Hi David,

We make available a large number of metrics from within the IRIS instance via REST API.  The REST APIs can be used to integrate with Azure Monitor or any other 3rd party monitoring solution that supports REST.  The exact metrics to use will be largely dependent on your application along with specific threshold values.  

As a starting point, I would suggest the following as a minimum:  

  • cpu_usage
  • db_freespace
  • db_latency
  • glo_ref_per_sec
  • glo_update_per_sec
  • jrn_block_per_sec
  • license_percent_used
  • phys_mem_percent_used
  • phys_reads_per_sec
  • phys_writes_per_sec
  • process_count
  • system_alerts_new
  • wd_cycle_time

The metrics collected are agnostic to running in Azure, AWS, or on-prem, so they are useful in any deployment scenario.  Here's a link to the all the standard available metrics and their descriptions within IRIS:
https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GCM_rest#GCM_rest_metrics

You can also create application specific metrics.  The details can be found here:
https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=GCM_rest#GCM_rest_metrics_application

I hope this helps.  

Thanks,
Mark B-

Mark Bolinsky · Apr 19, 2021 go to post

Hi Anzelem,

We have very strict Hardware Compatibility List (HCL) for HealthShare and TrakCare that we provide for our preferred solutions based on vendor benchmark testing and live sites.  We encourage all our HealthShare and TrakCare customers to stick to the HCL to ensure predictable performance and the high availability customers would expect from our software.

In regards to HPe Synergy 480 blades, these are just traditional blade servers and nothing really all that different as long as they are using one of our recommended processors and have the network/storage adapters to all-flash SAN storage.  When getting into hyper converged infrastructure (HCI) solutions, the storage architecture and management is the key factor (potentially pain point) because some HCI solutions are OK and others not so much.

I hope this helps.

Regards,

Mark B-

Mark Bolinsky · Jul 9, 2020 go to post

Hi Eriks,

Specific to you questions about why you cannot achieve 200MB/s, there are some specific physics/physical reasons why this is the case.  Firstly, your file copy is a completely different IO operation - it's performed at larger block size requests and 100% sequential in operation benefiting from file cache and/or storage controller cache along with NTFS read-ahead prediction.  

In a Caché SQL query, Caché (or IRIS) will do 8KB block reads and presumably random in nature as well depending on the query and the data/global structure, so any caching will be mostly limited to whatever you have defined for database cache (global buffers) in the Caché instance.  Since this is 5.0.21, I wouldn't expect your installation to have hundreds of GBs of global buffers (and I would not recommend that on 5.0.21 either), so you are at the mercy of disk latency of a single process doing random 8KB reads and not total throughput you see in a file copy operation.  

So, based on ~20MB/sec you are seeing, this indicates you are getting about 2500 8KB IOPS or .4ms single process storage latency - this is actually very good performance for a single process.  As you add more jobs in parallel you start approaching other limits in the IO chain such as SCSI queue depths at the VM layer, at the VMware ESXi layer, etc... and its more a IO operation limitation than a throughput (MB/s) limitation.

I hope this helps explain the situation you are seeing, and expected behavior because the ~20MB/s you see is just a factor of storage latency for a single process (.4ms) so that's a max IOPS per second (~2500) * 8KB IO size = ~20MB/sec

Kind regards,

Mark B-

Mark Bolinsky · Mar 9, 2020 go to post

They used to be available on our website, but have since been removed since the results where from 3 years ago.  The summary results from 2015 and 2017 have been included in graph-1 above in this new report for comparison.  Thanks.

Mark Bolinsky · Mar 4, 2020 go to post

Correct.  Gold 6252 series (aka "Cascade Lake") supports both DCPMM and DRAM.  However, keep in mind that when using DCPMM you need to have DRAM and should adhere to at least a 8:1 ratio of DCPMM:DRAM.

Mark Bolinsky · Mar 4, 2020 go to post

Hi Eduard,

Thanks for you questions.

1- On small scale I would stay with traditional DRAM.  DCPMM becomes beneficial when >1TB of capacity.

2- That was DDR4 DRAM memory in both read-intensive  and write-intensive Server #1 configurations.  In the read-intensive server configuration it was specifically DDR-2400, and in the write-intensive server configuration it was DDR-2600.

3- There are different CPUs in configuration in the read-intensive workload because this testing is meant to demonstrate upgrade paths from older servers to new technologies and the scalability increases offered in that scenario.  The write-intensive workload only used a different server in the first test to compare previous generation to the current generation with DCPMM.  Then the three following results demonstrated the differences in performance within the same server - just different DCPMM configurations.

4- Thanks.  I will see what happened to the link and correct.

Mark Bolinsky · Nov 19, 2019 go to post

Hi all,

Please note that these scripts are also usable with IRIS.  In each of the 'pre' and 'post' scripts you only need to change each of the "csession <CACHE INSTANCE> ..." references to "iris <IRIS INSTANCE> ..."

Regards,
Mark B-

Mark Bolinsky · Nov 7, 2019 go to post

This is certainly a good option as well, however there is still some risk associated with that in case there are actual issues with backup/snapshot and you actually want failover to occur.  This is a good example showing that there are numerous options available.

Mark Bolinsky · Nov 6, 2019 go to post

Using Veeam backup/snapshot is very common with Caché and IRIS, and when using the snapshot process there are a couple things to be aware of:

1. Make sure you are NOT including the VM's memory state as this will have a long impact to VM stun times.

2. Make sure you are current with VMware vSphere patches as there are some known issues with snapshot performance and data consistency in older versions of vSphere.  I would recommend being on at least vSphere 6.7 or above.

3. You need to make sure your journal disk is on a different VMDK than any of your CACHE.DATs and CACHE.WIJ especially after you the thaw the instance because a large burst of writes may happen and cause IO to flood/serialize the device and potentially block or slow down journal writes (...and triggers a premature mirror failover because of it).

4. You definitely need to use the ExternalFreeze/Thaw APIs to ensure the CACHE.DATs within the snapshot are "clean". 

5. Confirm your current Q0S timeout value as some earlier versions of Caché had a very low QoS value and with snapshots I believe it should be 8 set to seconds and not to exceed 30 seconds.

Also the links that Peter mentioned are very good links to reference as well for more details.

Mark Bolinsky · Jul 31, 2019 go to post

Hi Alexey,

I can help with your question.  The reason this is the way it is because you can't (or at least shouldn't) have a database file (CACHE.DAT or IRIS.DAT) opened in contending modes (open both as unbuffered and buffered) to avoid file corruption or stale data.  Now the actual writing of the online backup CBK file can be a buffered write because it is independent of the DB as you mentioned, but the actual reads of the database blocks from the online backup utility will be unbuffered direct IO reads.  This is where the slow-down may occur: from the reading the database blocks and not the actual writing of the CBK backup file.

Regards,
Mark B-

Mark Bolinsky · Nov 28, 2018 go to post

Hi Scott,

Have you looked at using the Ensemble Enterprise Monitor?  This provides a centralized "pane of glass" for a dashboard type display across multiple production.  Details of using it can be found here in the Ensemble documentation.

Regards,
Mark B-

Mark Bolinsky · Oct 29, 2018 go to post

Hi Ashish,

We are actively working with Nutanix on a potential example reference architecture, but nothing imminent at this time.  The challenges with HCI solutions, Nutanix being one of them, is there is more to the solution that just the nodes themselves.  The network topology and switches play a very important role.  

Additionally, performance with HCI solutions are good...until they aren't.  What I mean by that is performance can be good with HCI/SDDC solutions, however maintaining the expected performance during node failures and/or maintenance periods is the key.  Not all SSDs are created equal, so consideration of storage access performance during all situations such as normal operations, failure conditions, and node rebuild/rebalancing is important.  Also data locality plays a large role too with HCI, and in some HCI solution so does the working dataset size (ie - the larger the data set and random access patterns to that data can have an adverse and unexpected impact on storage latency).

Here's a link to an article I authored regarding our current experiences and general recommendations with HCI and SDDC-based solutions.

https://community.intersystems.com/post/software-defined-data-centers-sddc-and-hyper-converged-infrastructure-hci-–-important

So, in general, be careful when considering any HCI/SDDC solution to not fall into the HCI marketing hype or promises of being "low cost".  Be sure to consider failure/rebuild scenarios when sizing you HCI cluster.  Many times the often quoted "4-node cluster" just isn't ideal and more nodes may be necessary to support performance during failure/maintenance situations within a cluster.  We have come across many of these situations, so test test test.  :)

Kind regards,

Mark B

Mark Bolinsky · Sep 20, 2018 go to post

Hello Ashish,

Great question.  Yes, NetBackup is widely used by many of our customers, and the approach of using the ExternalFreeze/Thaw APIs is the best approach.  Also with you environment being on VMware ESXi 6, we also can support using VMDK snapshots as part of the backup process assume you have the feature in NetBackup 8.1 to support VMware guest snapshots.  I found the following link from NetBackup 8.1 and their support in a Vmware environment:  https://www.veritas.com/content/support/en_US/doc/NB_70_80_VE

You will want to have pre/post scripts added to the NetBackup backup job so that the database is frozen prior to taking the snapshot, and then thawed right after the snapshot.  Then NetBackup will take a clean backup of the VMDKs providing an application consistent backup.  Here is another link to an article on using ExternalFreeze/Thaw in a VMware environment:  https://community.intersystems.com/post/intersystems-data-platforms-and-performance-–-vm-backups-and-caché-freezethaw-scripts

I hope this helps.  Please let me know if you have any questions.

Regards,

Mark B-

Mark Bolinsky · Aug 15, 2018 go to post

Hi Jason,

We are working on a similar utility for writes now to support either a solely write or a mixed read/write workload.  I hope to have it posted to the community in the next few weeks.

Kind regards,

Mark B-

Mark Bolinsky · Aug 15, 2018 go to post

Hi Jason,

Thank you for your post.  We provide a storage performance utility called RANREAD.  This will actually use HealthShare/Ensemble (also Caché and InterSystems IRIS) to generate the workload rather than relying on an external tool to trying to simulate what HealthShare/Ensemble might be.  You can find the details here in this community article here

Kind regards,

Mark B-

Mark Bolinsky · Aug 3, 2018 go to post

Thanks Thomas.  Great article!  

One recommendation I would like to add is with VM-based snapshot backups, we recommend NOT including the VM's memory state as part of the snapshot.  This will greatly reduce the time a VM will be "stunned or paused" that would potentially bump up close to or exceed the QoS value.  Not including the memory state as part of the VM snapshot is OK for the database as recovery never relies on information in memory (assuming the appropriate ExternalFreeze and ExternalThaw APIs are used), since all writes from the database are frozen during the snapshot (journal writes are still occurring).

Mark Bolinsky · Jun 14, 2018 go to post

Hi Paul,

The call-out method is highly customized and depends on the API features of a particular load balancer.  Basically the code is to added to the ^ZMIRROR routine to call whatever API/CLI is available from the load balancer (or the EC2 CLI calls). 

For the  appliance polling method (the one I recommend because it is very simple and clean).   Here is a section from my AWS reference architecture article found here.  The link also provides some good diagrams showing the usage.

AWS Elastic Load Balancer Polling Method

A polling method using the CSP Gateway’s mirror_status.cxw page available in 2017.1 can be used as the polling method in the ELB health monitor to each mirror member added to the ELB server pool.  Only the primary mirror will respond ‘SUCCESS’ thus directing network traffic to only the active primary mirror member. 

This method does not require any logic to be added to ^ZMIRROR.  Please note that most load-balancing network appliances have a limit on the frequency of running the status check.  Typically, the highest frequency is no less than 5 seconds, which is usually acceptable to support most uptime service level agreements.

A HTTP request for the following resource will test the Mirror Member status of the LOCAL Cache configuration.

 /csp/bin/mirror_status.cxw

For all other cases, the path to these Mirror status requests should resolve to the appropriate Cache server and NameSpace using the same hierarchical mechanism as that used for requesting real CSP pages.

Example:  To test the Mirror Status of the configuration serving applications in the /csp/user/ path:

 /csp/user/mirror_status.cxw

Note: A CSP license is not consumed by invoking a Mirror Status check.

Depending on whether or not the target instance is the active Primary Member the Gateway will return one of the following CSP responses:

** Success (Is the Primary Member)

===============================

   HTTP/1.1 200 OK

   Content-Type: text/plain

   Connection: close

   Content-Length: 7

   SUCCESS

** Failure (Is not the Primary Member)

===============================

   HTTP/1.1 503 Service Unavailable

   Content-Type: text/plain

   Connection: close

   Content-Length: 6

   FAILED

** Failure (The Cache Server does not support the Mirror_Status.cxw request)

===============================

   HTTP/1.1 500 Internal Server Error

   Content-Type: text/plain

   Connection: close

   Content-Length: 6

   FAILED

Mark Bolinsky · Nov 7, 2017 go to post

We are receiving more and more requests for VSS integration, so there may be some movement on it, however no guarantees or commitments at this time.  

In regards to the alternative as a crash consistent backup, yes it would be safe as long as the databases, WIJ, and journals are all included and have a consistent point-in-time snapshot.  The databases in the backup archive may be "corrupt", and not until after starting Caché for the WIJ and journals to be applied will it be physically accurate.  Just like you said - a crash consistent backup and the WIJ recovery is key to the successful recovery.  

I will post back if I hear of changes coming with VSS integration.

Mark Bolinsky · Nov 3, 2017 go to post

Hi Dean - thanks for the comment.  There are no changes required from a Caché standpoint, however Microsoft would need to add the similar functionality to Windows to allow for Azure Backup to call a script within the target Windows VM similar to how it is done with Linux.  The scripting from Caché would be exactly the same on Windows except for using .BAT syntax rather then Linux shell scripting once Microsoft provides that capability.  Microsoft may already have it this capability?  I'll have to look to see if they have extended it to Windows as well.

Regards,
Mark B-

Mark Bolinsky · Oct 25, 2017 go to post

Hi Raymond,

Thank you for your question.  I can help with your question.  We have done a lot of testing with EC2, and the performance of an EC2 instance will vary based on an on-demand or reserved instances even of the same EC2 instance type.  In AWS a given EC2 instance type's reported number of vCPU is an individual thread on the processor as a "logical processor".  The OS (and Ensemble/HealthShare as well for that matter) will only see a given instance's number of vCPUs, and the OS will only schedule jobs on those as it sees them. Ensemble and HealthShare are process based - not thread based, so for an instance type of m4.large with 4 vCPUs will mean only 4 jobs in parallel will execute as a time.

In your specific case with the large amount of XSLT parsing and adjusting pool sizes, you will want to first determine if FIFO is a requirement, if so, then unfortunately you need to remain at a pool size of 1 to ensure FIFO.  However, if FIFO is not required in your production or a given Business Service/Process/Operation, you can adjust the pool sizes to values higher than 1 to manage the message queues.  Having a large pool size won't impact the performance or a single XSTL parse, however it will allow for more parallel messages and XSLT parsing.  If you see CPU utilization at 100% and the message queues continual grow, you may need a large EC2 instance type (and larger pool size) to accommodate the message rates.

I hope this helps.  

Kind regards,

Mark B-

Mark Bolinsky · Mar 22, 2017 go to post

Hi Mack,

I can help here.  The VMS/Itanium system you are migrating from is quite old, and has quite slow processors.  For something like this you can figure at least 4 of the McKinley cores (maybe more) to 1 single current model Intel Xeon E5 v4 series core.  I would look to using a server such as a single-socket system with an Intel Xeon E5-2667v4 processor and 64GB of RAM (more RAM doesn't hurt either).  The E5-2667v4 processor is a 8-core processor @ 3.2Ghz each which is far more CPU than you would need, however it's actually quite difficult to get a smaller server theses.  

For a workload like this, a virtual machine in vSphere, Hyper-V, or KVM would probably be more appropriate.

Also, I have a few comments on your current Caché configuration:

  • The amount of routine buffers configured you have configured (3584MB) exceeds the maximum allowed (max is only 1023MB).  You can confirm in your cconsole.log that startup actually reduced to the max value.  You will want to update your routine cache size to 1023MB so that it takes effect on the next Caché restart.
  • I see you have 512MB of 2KB database buffers allocated and 43496MB of 8KB buffers.  I would suggest removing the allocation of the 2KB buffers completely and just allow any 2KB databases you have to use the 8KB buffers.  That way you aren't artificially capping your database cache.
  • Speaking of 2KB databases, If you still actually have 2KB databases on your system, it is highly recommended to convert those to 8KB databases for data safety and performance reasons.  

Kind regards,

Mark B-

Mark Bolinsky · Feb 22, 2017 go to post

I will revise the post to be more clear that THP is enabled by default in 2.6.38 kernel but may be available in prior kernels and to reference your respective Linux distributions documentation for confirming and changing the setting.  Thanks for your comments.

Mark Bolinsky · Feb 22, 2017 go to post

Hi Alexander,

Thank you for you post.  We are only relying on what RH documentation is stating as to when THP was introduced to the main stream kernel (2.6.38) and enabled by default as noted in the RH post you referenced.  The option may have existed in previous kernels (although I would not recommending to try it), it may not have been enabled by default.  All the documentation I can find on THP support in RH references the 2.6.38 kernel where is was merged feature.

If you are finding it in previous kernels, confirm that THP are enabled by default or not.  That would be interesting to know.  Unfortunately there isn't much we can do other than to do the checks for enablement as mentioned in the post.  As the ultimate confirmation, RH and the other Linux distributions would need to update their documentation to confirm when this behavior was enacted in the respective kernel versions.  

As I mentioned in other comments, the use of THP is not necessarily a bad thing and won't cause "harm" to a system, but there may be performance impacts for applications that have a large amount of process creation as part of their application.

Kind regards,

Mark B-

Mark Bolinsky · Feb 22, 2017 go to post

Hi Alexey,

Thank you for your comment.  Yes, both THP and traditional/reserved Huge_pages can be used at the same time, however there is not benefit and in fact systems with many (thousands) of Caché processes, especially if there is a lot of process creation, has shown a performance penalty in testing.  The overhead of instantiating the THP for those processes at a high rate can be noticeable.  Your application may not exhibit this scenario and may be ok.  

The goal of this article is to provide guidance for those that may not know which is the best option to choose and/or point out that this is a change in recent Linux distributions.  You may find that THP usage is perfectly fine for your application.  There is no replacement for actual testing and benchmarking your application.  :)

Kind regards,

Mark B-

Mark Bolinsky · Oct 25, 2016 go to post

Hi Anzelem,

Here are the steps that need to be defined in your VCS cluster resource group with dependencies.

  • Remount the storage <— this is not new
  • Relocate the cluster IP <— this is not new
  • Simple VCS application/script agent to restart the ISC Agent < — THIS IS NEW
  • ISC VCS cluster agent to start Caché < — this is not new (make the previous step a dependency before executing)

The script to start the ISCAgent would be dependent on the storage being mounted in the first step. 

This should provide you with the full automation needed here.  Let me know if there any any concerns or problems with the above steps.

Regards,

Mark B-