For newer storage, especially all flash, 10,000 iterations will be too quick, change this to 100,000 for sustained read activity -- it should be less than a minute on SSD storage for each step. For example using the above example;

for i in `seq 2 2 30`; do echo "do ##class(PerfTools.RanRead).Run(\"/db/RANREAD\",${i},100000)" | csession CACHEINSTNAME -U "%SYS"; done

hmmm... it worked for me just now.... I did notice when I exported from the markdown editor I used it came across as "\<!--break--\> escaping the slash, but edit to "<!--break-->"  worked OK. on the post about minimum monitoring and alerting solution. To be honest I had not tried until you mentioned it. haha yes switch MD to WYSWIG is a big mistake :(

Hi, I think you have got it. Mark and I are saying the same thing. Sometimes a picture helps. The following is based on some presentations I have seen from Intel and VMware. It shows what happens when multiple VMs are sharing a host, which is why we recommend a reserved instance. But this also illustrates hyper-threading in general.

So to recap; A hyper-thread is not a complete core. A core is a core is a core, hyper-threading does not create more physical cores. The following picture shows a workload going through 4 processing units... micro execution units in a processor. 

One VM is green a second is purple, white is idle. A lot of time is spent idle waiting for memory cache, IO, etc.

As you can see with hyper-threading on the right  you don’t suddenly get two processors, not 2x, and expectation is  ~10-30% improvement in processing time overall. 

The schedular can schedule processing when there idle time, but remember on a system under heavy load CPU utilisation will be saturated anyway so there are less white idle time.

I also received feedback on this from a coworker who has been running workloads on AWS which I think is useful; 

 "I selected the large EC2 instances which used all the available vCPU on underlying processors.   ( AWS would reserve  a few vCPUs  for management functions.  )    Thus,  a AWS 40vCPU system is roughly  equivalent to 20 core with HT  bare metal system of the same clock speed. So right,  If you want the equivalent of 4 VMware cpus , allocate at least 8 vCPUs. Also note the  clock speeds of AWS are slower than what you are typically using.  [ie the servers I use in my on premises benchmarks]. You can  use the specint rate as the comparison." 

For more on the last comment and how to compare SPECint metrics see:

https://community.intersystems.com/post/intersystems-data-platforms-and-...

Good question. InterSystems uses AWS for many different workloads and scenarios. There are so many types of server offered, it really is a case where you must test yourself.

As a starting point:

You really need understand just what AWS is giving you. For a list of instance types see the link below.  And note the small print; Each vCPU is a hyperthread of an Intel Xeon core except for T2 and m3.medium.

If you know your sizing on VMware or bare metal with hyperthreading enabled and you usually need 4 cores (with hyperthreading) - I would start with sizing for 8 EC2 vCPUs. Of course you will have to test this before going into production. It would be great if you came back to the community if you came back and commented after your testing.

AWS:

https://aws.amazon.com/ec2/instance-types/

For more general information I recommend Marks post on EC2 reference architecture.

https://community.intersystems.com/post/intersystems-technologies-amazon...

Mark also noted there is a difference between reserved instances and on-demand.  Reserved is just that – dedicated resources, so you won’t be contending with other VMs. If your requirement is consistent, and you want consistent performance look at the reserved instances.

Hi, It would have been better to say "Database" could be different to "Journal".   SPBM can be different for all VMDKs (disks), but that doesn't mean it should be. As an example on a four node all flash cluster;

I am using just two storage policies for a production VM disks.

- Policy: VSAN default: for OS and backup/scratch. Failures to tolerate (FTT)=1. Disk stripes=1. Object space reservation=0.

- Policy: Production: for database and journals. FTT=1. Disk stripes=1. Object space reservation=100.

For performance on each production VM use separate PVSCI adapters for OS, journal, database, and backup/scratch.

 

For non-production VMs I have a policy that makes better use of available capacity, there is still HA in the storage:

- Policy: Non-Production: for all data disks. FTT method=RAID 5/6 (Erasure coding). FTT=1. Object space reservation=0.

Note. These values are not written in stone and will depend on your requirements. While you need to think about performance it should be great out of the box. What you must also consider is availability and capacity.

To clarify and answer a question asked offline with an example; 

"Alternatively, you can enable OS-level authentication and create a Caché account for the OS user running the script."

Create a user for backup functions in the operating system named backup or similar, and add a user with same name in Caché.

Assign an appropriate role to the new Caché user based on your security requirements (for example you can test with %All role).

Then enable the system parameter to allow OS authentication.  Follow the steps in Configuring for Operating-System–based Authentication. (%Service_Terminal on Unix or %Service_Console for Windows).

The advantage of using a standard user name is you have a consistent approach for all instances.  

Hi, LVM and VMware approach to snapshots are very different. But the way we interact with them with freeze/thaw is similar.

  • freeze Caché/snapshot (VM, LVM, array,etc)/thaw Caché/backup something.../etc

LVM presents a view of the volume (your data) at the instant of the snapshot, then you can copy (backup) all or selected files in that view somewhere else. If you look at the snapshot volume, filesystem for example with ls -l you will see all your files as they were back at the snapshot instant. In LVM2 the snapshot can be read/write which is why I say you should mount read only for backups. If you look at the parent you see your files as they are now. You must have unused space in the logical volume to allocate to the snapshot volume (created with the lvcreate command). Yes, if the snapshot volume fills up with changed data it will now be useless and is discarded. So you need to understand the data change rate at your chosen backup time. But a bit of testing should tell you that. There are videos and more help via $google.

Think of VMware server instance as just a bunch of files on the datastore disks which encapsulates the whole server including OS, drivers, Caché, data, etc, etc. The VMware delta disk is where all block changes are written since the snapshot started. The parent files are the VM at the instant of snapshot. You do not have a simple 'view back in time' capability that LVM has. The delta file(s) are written to the VM datastore, so there must be space for that too. But thats not so fiddly as LVM because you probably have a lot of spare space on the datastore -- but you still need to be sure to plan for that as well!

Which backup uses more space is really dependant on how long the snapshot hangs around. Snapshot files are the size of changed blocks. You want to delete snapshots as soon as the backup finishes to minimise space used either way. Smart VMware utilities that can do incremental backups through CBT will probably be quickest.

To backup only selected files/filesystems on logical volumes (for example a filesystem on LVM2) the snapshot process and freeze/thaw scripts can still be used and would be just about the same.

As an example the sequence of events is:

  • Start process e.g. via script scheduled via cron
  • Freeze Caché via script as above.
  • Create snapshot volume(s) withlvcreate.
  • Thaw Caché via script as above.
  • mount snapshot filesystem(s) (for safety mount read only).
  • backup snapshot files/filesystems to somewhere else…
  • unmountsnapshot filesystem(s)
  • Remove snapshot volume(s) withlvremove

Assuming the above is scripted with appropriate error traps. This will work for virtual or physical systems.

There are many resources on the web for explaining LVM snapshots. A few key points are:

LVM snapshots use a different copy-on-write to VMware. VMware writes to the delta disk and merges the changes when the snapshot is deleted which has an impact that is managed but must be considered -- as explained above. For LVM snapshots at snapshot creation LVM creates a pool of blocks (the snapshot volume) which also contains a full copy of the LVM metadata of the volume. When writes happen to the main volume the block being overwritten is copied to this new pool on the snapshot volume and the new block is written to the main volume. So the more data that changes between when a snapshot was taken and the current state of the main volume, the more space will get consumed by that snapshot pool. So you must consider the data change rate in your planning. When an access comes for a specific block, LVM knows which block to access.

Like VMware, best practice for production systems is not to have multiple snapshots of the same volume, every time you write to a block in the main volume you potentially trigger writes in every single snapshot in the tree. For the same reason accessing a block can be slower.

Deleting a single snapshot is very fast. LVM just drops the snapshot pool.