Hi Eriks,

Specific to you questions about why you cannot achieve 200MB/s, there are some specific physics/physical reasons why this is the case.  Firstly, your file copy is a completely different IO operation - it's performed at larger block size requests and 100% sequential in operation benefiting from file cache and/or storage controller cache along with NTFS read-ahead prediction.  

In a Caché SQL query, Caché (or IRIS) will do 8KB block reads and presumably random in nature as well depending on the query and the data/global structure, so any caching will be mostly limited to whatever you have defined for database cache (global buffers) in the Caché instance.  Since this is 5.0.21, I wouldn't expect your installation to have hundreds of GBs of global buffers (and I would not recommend that on 5.0.21 either), so you are at the mercy of disk latency of a single process doing random 8KB reads and not total throughput you see in a file copy operation.  

So, based on ~20MB/sec you are seeing, this indicates you are getting about 2500 8KB IOPS or .4ms single process storage latency - this is actually very good performance for a single process.  As you add more jobs in parallel you start approaching other limits in the IO chain such as SCSI queue depths at the VM layer, at the VMware ESXi layer, etc... and its more a IO operation limitation than a throughput (MB/s) limitation.

I hope this helps explain the situation you are seeing, and expected behavior because the ~20MB/s you see is just a factor of storage latency for a single process (.4ms) so that's a max IOPS per second (~2500) * 8KB IO size = ~20MB/sec

Kind regards,

Mark B-

They used to be available on our website, but have since been removed since the results where from 3 years ago.  The summary results from 2015 and 2017 have been included in graph-1 above in this new report for comparison.  Thanks.

Correct.  Gold 6252 series (aka "Cascade Lake") supports both DCPMM and DRAM.  However, keep in mind that when using DCPMM you need to have DRAM and should adhere to at least a 8:1 ratio of DCPMM:DRAM.

Hi Eduard,

Thanks for you questions.

1- On small scale I would stay with traditional DRAM.  DCPMM becomes beneficial when >1TB of capacity.

2- That was DDR4 DRAM memory in both read-intensive  and write-intensive Server #1 configurations.  In the read-intensive server configuration it was specifically DDR-2400, and in the write-intensive server configuration it was DDR-2600.

3- There are different CPUs in configuration in the read-intensive workload because this testing is meant to demonstrate upgrade paths from older servers to new technologies and the scalability increases offered in that scenario.  The write-intensive workload only used a different server in the first test to compare previous generation to the current generation with DCPMM.  Then the three following results demonstrated the differences in performance within the same server - just different DCPMM configurations.

4- Thanks.  I will see what happened to the link and correct.

Hi all,

Please note that these scripts are also usable with IRIS.  In each of the 'pre' and 'post' scripts you only need to change each of the "csession <CACHE INSTANCE> ..." references to "iris <IRIS INSTANCE> ..."

Regards,
Mark B-

This is certainly a good option as well, however there is still some risk associated with that in case there are actual issues with backup/snapshot and you actually want failover to occur.  This is a good example showing that there are numerous options available.

Using Veeam backup/snapshot is very common with Caché and IRIS, and when using the snapshot process there are a couple things to be aware of:

1. Make sure you are NOT including the VM's memory state as this will have a long impact to VM stun times.

2. Make sure you are current with VMware vSphere patches as there are some known issues with snapshot performance and data consistency in older versions of vSphere.  I would recommend being on at least vSphere 6.7 or above.

3. You need to make sure your journal disk is on a different VMDK than any of your CACHE.DATs and CACHE.WIJ especially after you the thaw the instance because a large burst of writes may happen and cause IO to flood/serialize the device and potentially block or slow down journal writes (...and triggers a premature mirror failover because of it).

4. You definitely need to use the ExternalFreeze/Thaw APIs to ensure the CACHE.DATs within the snapshot are "clean". 

5. Confirm your current Q0S timeout value as some earlier versions of Caché had a very low QoS value and with snapshots I believe it should be 8 set to seconds and not to exceed 30 seconds.

Also the links that Peter mentioned are very good links to reference as well for more details.

Hi Alexey,

I can help with your question.  The reason this is the way it is because you can't (or at least shouldn't) have a database file (CACHE.DAT or IRIS.DAT) opened in contending modes (open both as unbuffered and buffered) to avoid file corruption or stale data.  Now the actual writing of the online backup CBK file can be a buffered write because it is independent of the DB as you mentioned, but the actual reads of the database blocks from the online backup utility will be unbuffered direct IO reads.  This is where the slow-down may occur: from the reading the database blocks and not the actual writing of the CBK backup file.

Regards,
Mark B-