Murray Oldfield · Oct 26, 2017 go to post

I also received feedback on this from a coworker who has been running workloads on AWS which I think is useful; 

 "I selected the large EC2 instances which used all the available vCPU on underlying processors.   ( AWS would reserve  a few vCPUs  for management functions.  )    Thus,  a AWS 40vCPU system is roughly  equivalent to 20 core with HT  bare metal system of the same clock speed. So right,  If you want the equivalent of 4 VMware cpus , allocate at least 8 vCPUs. Also note the  clock speeds of AWS are slower than what you are typically using.  [ie the servers I use in my on premises benchmarks]. You can  use the specint rate as the comparison." 

For more on the last comment and how to compare SPECint metrics see:

https://community.intersystems.com/post/intersystems-data-platforms-and…

Murray Oldfield · Oct 25, 2017 go to post

Good question. InterSystems uses AWS for many different workloads and scenarios. There are so many types of server offered, it really is a case where you must test yourself.

As a starting point:

You really need understand just what AWS is giving you. For a list of instance types see the link below.  And note the small print; Each vCPU is a hyperthread of an Intel Xeon core except for T2 and m3.medium.

If you know your sizing on VMware or bare metal with hyperthreading enabled and you usually need 4 cores (with hyperthreading) - I would start with sizing for 8 EC2 vCPUs. Of course you will have to test this before going into production. It would be great if you came back to the community if you came back and commented after your testing.

AWS:

https://aws.amazon.com/ec2/instance-types/

For more general information I recommend Marks post on EC2 reference architecture.

https://community.intersystems.com/post/intersystems-technologies-amazo…

Mark also noted there is a difference between reserved instances and on-demand.  Reserved is just that – dedicated resources, so you won’t be contending with other VMs. If your requirement is consistent, and you want consistent performance look at the reserved instances.

Murray Oldfield · Aug 16, 2017 go to post

Hi Anzelem, no. But I m very interested to hear of experience community readers with any HCI solution. Either through the comments or directly.

Murray Oldfield · Jun 14, 2017 go to post

Correction June 2017: A VMware engineer has pointed out that the rule for monster VMs to: Configure VMs with 1 vCPU per socket. is no longer best practice. I have added a link at the appropriate section to current best practice and rules of thumb.

Murray Oldfield · Jun 13, 2017 go to post

Hi Mikhail, very nice! And very useful.

Have you done any work on templating, e.g. so one or more systems may be selected to be displayed on one dashboard?

Murray Oldfield · May 11, 2017 go to post

If you are watching this page; I have checked in some changes on GitHub to make it easier to chart multiple days e.g. a weeks worth of pButtons at a time.

Murray Oldfield · Mar 14, 2017 go to post

Hi, It would have been better to say "Database" could be different to "Journal".   SPBM can be different for all VMDKs (disks), but that doesn't mean it should be. As an example on a four node all flash cluster;

I am using just two storage policies for a production VM disks.

- Policy: VSAN default: for OS and backup/scratch. Failures to tolerate (FTT)=1. Disk stripes=1. Object space reservation=0.

- Policy: Production: for database and journals. FTT=1. Disk stripes=1. Object space reservation=100.

For performance on each production VM use separate PVSCI adapters for OS, journal, database, and backup/scratch.

For non-production VMs I have a policy that makes better use of available capacity, there is still HA in the storage:

- Policy: Non-Production: for all data disks. FTT method=RAID 5/6 (Erasure coding). FTT=1. Object space reservation=0.

Note. These values are not written in stone and will depend on your requirements. While you need to think about performance it should be great out of the box. What you must also consider is availability and capacity.

Murray Oldfield · Mar 5, 2017 go to post

To clarify and answer a question asked offline with an example; 

"Alternatively, you can enable OS-level authentication and create a Caché account for the OS user running the script."

Create a user for backup functions in the operating system named backup or similar, and add a user with same name in Caché.

Assign an appropriate role to the new Caché user based on your security requirements (for example you can test with %All role).

Then enable the system parameter to allow OS authentication.  Follow the steps in Configuring for Operating-System–based Authentication. (%Service_Terminal on Unix or %Service_Console for Windows).

The advantage of using a standard user name is you have a consistent approach for all instances.  

Murray Oldfield · Feb 20, 2017 go to post

Sort obvious, but remember as well as enable OS-level authentication -- Also create a Caché account for the OS user.

Murray Oldfield · Jan 23, 2017 go to post

Very Cool! Better than my dodgy old perl scripts and gnuplot ;)

Just a couple of hints for others like me who bumbled through some of the installation steps (On MacOS Sierra, 10.12.2).

I needed to use sudo i.e. run sudo pip install matplotlib and sudo pip install pandas

I also used Anaconda to install and run jupyter notebook

Murray Oldfield · Jan 20, 2017 go to post

Hi, LVM and VMware approach to snapshots are very different. But the way we interact with them with freeze/thaw is similar.

  • freeze Caché/snapshot (VM, LVM, array,etc)/thaw Caché/backup something.../etc

LVM presents a view of the volume (your data) at the instant of the snapshot, then you can copy (backup) all or selected files in that view somewhere else. If you look at the snapshot volume, filesystem for example with ls -l you will see all your files as they were back at the snapshot instant. In LVM2 the snapshot can be read/write which is why I say you should mount read only for backups. If you look at the parent you see your files as they are now. You must have unused space in the logical volume to allocate to the snapshot volume (created with the lvcreate command). Yes, if the snapshot volume fills up with changed data it will now be useless and is discarded. So you need to understand the data change rate at your chosen backup time. But a bit of testing should tell you that. There are videos and more help via $google.

Think of VMware server instance as just a bunch of files on the datastore disks which encapsulates the whole server including OS, drivers, Caché, data, etc, etc. The VMware delta disk is where all block changes are written since the snapshot started. The parent files are the VM at the instant of snapshot. You do not have a simple 'view back in time' capability that LVM has. The delta file(s) are written to the VM datastore, so there must be space for that too. But thats not so fiddly as LVM because you probably have a lot of spare space on the datastore -- but you still need to be sure to plan for that as well!

Which backup uses more space is really dependant on how long the snapshot hangs around. Snapshot files are the size of changed blocks. You want to delete snapshots as soon as the backup finishes to minimise space used either way. Smart VMware utilities that can do incremental backups through CBT will probably be quickest.

Murray Oldfield · Jan 18, 2017 go to post

To backup only selected files/filesystems on logical volumes (for example a filesystem on LVM2) the snapshot process and freeze/thaw scripts can still be used and would be just about the same.

As an example the sequence of events is:

  • Start process e.g. via script scheduled via cron
  • Freeze Caché via script as above.
  • Create snapshot volume(s) withlvcreate.
  • Thaw Caché via script as above.
  • mount snapshot filesystem(s) (for safety mount read only).
  • backup snapshot files/filesystems to somewhere else…
  • unmountsnapshot filesystem(s)
  • Remove snapshot volume(s) withlvremove

Assuming the above is scripted with appropriate error traps. This will work for virtual or physical systems.

There are many resources on the web for explaining LVM snapshots. A few key points are:

LVM snapshots use a different copy-on-write to VMware. VMware writes to the delta disk and merges the changes when the snapshot is deleted which has an impact that is managed but must be considered -- as explained above. For LVM snapshots at snapshot creation LVM creates a pool of blocks (the snapshot volume) which also contains a full copy of the LVM metadata of the volume. When writes happen to the main volume the block being overwritten is copied to this new pool on the snapshot volume and the new block is written to the main volume. So the more data that changes between when a snapshot was taken and the current state of the main volume, the more space will get consumed by that snapshot pool. So you must consider the data change rate in your planning. When an access comes for a specific block, LVM knows which block to access.

Like VMware, best practice for production systems is not to have multiple snapshots of the same volume, every time you write to a block in the main volume you potentially trigger writes in every single snapshot in the tree. For the same reason accessing a block can be slower.

Deleting a single snapshot is very fast. LVM just drops the snapshot pool.

Murray Oldfield · Jan 15, 2017 go to post

Hi Alexey, good question. There is no one size fits all. My  aim is to highlight how external backups work so teams responsible can evaluate their best solution when talking to vendors. 

Third-party solutions will be a suite of management tools, not simply backup/restore, so there are many features to evaluate. For example, products that backup VMs will have features for change block tracking (CBT) so only changed blocks in the VM (not just changes to CACHE.DAT) are backed up. So incremental. But they also include many other features including replication, compression, deduplication, and data exclusion to manage what is backed up, when and what space is required. Snapshot solutions at the storage array level also have many similar functions. You can also create your own solutions integrating freeze/thaw, for example using LVM snapshots. 

Often a Caché application is only one of many applications and databases at a company. So usually the question is turned around to "can you backup <Caché Application x> with <vendor product y>". So now with knowledge of how to implement freeze/thaw you can advise the vendor of your Caché application requirements.

Murray Oldfield · Jan 15, 2017 go to post

Addendum: I was reminded that there is an extra step when configuring external backups.

When the security configuration requires that the backup script supply Caché credentials, you can do this by redirecting input from a file containing the needed credentials. Alternatively, you can enable OS-level authentication and create a Caché account for the OS user running the script.

Please see the online documentation for full details

Murray Oldfield · Dec 29, 2016 go to post

I wanted to blast out the polymetric dashboard to a bunch of my lab VMs this morning so I used ansible, I put code snippet in case its useful.

I admit this is a bit of a hack but shows some useful syntax.

polykit-install.yml

---
### Main Task - instal polymetric dashboard ###
- debug: msg="Start polykit-install.yml"

- name: local path on laptop for tar.gzip of dashboard kit relative to ansible root
  set_fact:
    local_package_path: ./z-packages

- debug: 
    msg: "{{ local_package_path }}"

# hack! ASSUME only one instance per host!!!! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!!!

- name: get instance name
  shell:
    ccontrol qlist | awk -F '^' '{print $1}'
  register: instance_name

- debug: var=instance_name.stdout    

- name: get instance path (add ./mgr )
  shell:
    ccontrol qlist | awk -F '^' '{print $2}'
  register: instance_path

- debug: var=instance_path.stdout    

- name: set base path
  set_fact:
    base_path: "{{ instance_path.stdout }}/mgr/polykit"

# hack! ASSUME only one instance per host!!!! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!!!


# Create install directory on target
- name: Create install directory
  file:
    dest="{{ base_path }}"
    mode={{ 777 }}
    state=directory

# Copy polykit file to install directory
- name: Copy polykit.tar.gz to install directory
  copy:
    src="{{ local_package_path }}/polykit-v1.0.0.tar.gz"
    dest="{{ base_path }}"
    remote_src=False
    
# Unarchive
- name: Unarchive polykit database
  unarchive:
    src="{{ base_path }}/polykit-v1.0.0.tar.gz"
    dest="{{ base_path }}"
    creates="{{ base_path }}/polykit-v1.0.0/DashboardInstaller.xml"
    remote_src=yes


# Run installer

# Build text file

- name: build csession file line at a time
  shell:
    echo "set status = \$System.OBJ.Load(\""{{ base_path }}/polykit-v1.0.0/DashboardInstaller.xml"\", \"ck\")" >"{{ base_path }}"/polykit-inst.txt
  
- shell:
    echo "set status = {{ '#' }}{{ '#' }}class(SYS.Monitor.DashboardInstaller).Install(\""{{ base_path }}/polykit-v1.0.0"\",0)" >> "{{ base_path }}"/polykit-inst.txt

- shell:
    echo "h">>"{{ base_path }}"/polykit-inst.txt

- shell:  
    cat "{{ base_path }}"/polykit-inst.txt
  register: polyinst_check
- debug: var=polyinst_check.stdout_lines

- name: Install dashboard
  shell:  
    cat "{{ base_path }}/polykit-inst.txt" | csession "{{ instance_name.stdout }}" -U %SYS
  register: polyinst_install
- debug: var=polyinst_install.stdout_lines
Murray Oldfield · Dec 29, 2016 go to post

I wanted to blast the dashboard out to a bunch of my lab VMs this morning so I used ansible, I put code snippet here in case its useful.

The full post and introduction for using ansible is here; Ansible Post

I admit this is a bit of a hack but shows some useful syntax.

polykit-install.yml

---
### Main Task - instal polymetric dashboard ###
- debug: msg="Start polykit-install.yml"

- name: local path on laptop for tar.gzip of dashboard kit relative to ansible root
  set_fact:
    local_package_path: ./z-packages

- debug: 
    msg: "{{ local_package_path }}"

# hack! ASSUME only one instance per host!!!! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!!!

- name: get instance name
  shell:
    ccontrol qlist | awk -F '^' '{print $1}'
  register: instance_name

- debug: var=instance_name.stdout    

- name: get instance path (add ./mgr )
  shell:
    ccontrol qlist | awk -F '^' '{print $2}'
  register: instance_path

- debug: var=instance_path.stdout    

- name: set base path
  set_fact:
    base_path: "{{ instance_path.stdout }}/mgr/polykit"

# hack! ASSUME only one instance per host!!!! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !!!!


# Create install directory on target
- name: Create install directory
  file:
    dest="{{ base_path }}"
    mode={{ 777 }}
    state=directory

# Copy polykit file to install directory
- name: Copy polykit.tar.gz to install directory
  copy:
    src="{{ local_package_path }}/polykit-v1.0.0.tar.gz"
    dest="{{ base_path }}"
    remote_src=False
    
# Unarchive
- name: Unarchive polykit database
  unarchive:
    src="{{ base_path }}/polykit-v1.0.0.tar.gz"
    dest="{{ base_path }}"
    creates="{{ base_path }}/polykit-v1.0.0/DashboardInstaller.xml"
    remote_src=yes


# Run installer

# Build text file

- name: build csession file line at a time
  shell:
    echo "set status = \$System.OBJ.Load(\""{{ base_path }}/polykit-v1.0.0/DashboardInstaller.xml"\", \"ck\")" >"{{ base_path }}"/polykit-inst.txt
  
- shell:
    echo "set status = {{ '#' }}{{ '#' }}class(SYS.Monitor.DashboardInstaller).Install(\""{{ base_path }}/polykit-v1.0.0"\",0)" >> "{{ base_path }}"/polykit-inst.txt

- shell:
    echo "h">>"{{ base_path }}"/polykit-inst.txt

- shell:  
    cat "{{ base_path }}"/polykit-inst.txt
  register: polyinst_check
- debug: var=polyinst_check.stdout_lines

- name: Install dashboard
  shell:  
    cat "{{ base_path }}/polykit-inst.txt" | csession "{{ instance_name.stdout }}" -U %SYS
  register: polyinst_install
- debug: var=polyinst_install.stdout_lines
Murray Oldfield · Dec 29, 2016 go to post

One of the things I often do in lab environments (not production) is to use the Caché ./mgr folder because I know its always going to exist and has the advantage of also being a short relative path for .xml files etc.

If I copied the setup files  to <cache instance directory>/mgr/polykit/polykit-v1.0.0

Then in Caché I can use a short relative path:

%SYS>set status = $System.OBJ.Load("./polykit/polykit-v1.0.0/DashboardInstaller.xml", "ck")

Murray Oldfield · Dec 28, 2016 go to post

Hi, Great tool - I look forward exploring it further. In the meantime... it appears that chart sensor does not work >1,000,000 e.g. see glorefs/sec. It does show on the sensor display...

Is it a simple fix to increase a graph max value? See the example below, 

OK on sensor

Murray Oldfield · Dec 12, 2016 go to post

Hi, Have you investigated alternatives to cache online backup - Cache external backup? Eg third-party backup solutions that use snapshots?

Murray Oldfield · Dec 8, 2016 go to post

Hi Alexey, I have not seen Caché implement distributed databases as you describe at customers. I can see how flash solutions in storage IO is a way to increase IOPS and lower latency, so perhaps some way to moving the choke point on that  particular problem. Of course the problem has not gone away we just moved threshold.

Murray Oldfield · Dec 7, 2016 go to post

Hi Anzelem, Obviously you had some problems. And the best solution in your case was to use a single instance. My experience is ECP is widely used where customers want a single database instance to scale beyond the resources of a single server. On a well configured solution the impact for a user , e.g. response time for a screen, should be negligible. I have also seen cases where ECP is chosen for HA.

Bottlenecks in the infrastructure (network, storage or application) will have an impact.  This is true for single instance or ECP configurations. As noted in the post ECP has strict requirements for network and and storage that will impact performance. There are also considerations in application design. 

It is true there are additional latencies when using a distributed architecture. Even on a well resourced set-up expect there to be some loss of overall efficiency when comparing processing on a single server vs distributed architecture -- Four ECP application servers (CPU + Memory size x) will not produce throughput equal to the total of four database servers (CPU + Memory size x) running separate instances of an application.  But as above this should not impact individual users experience.

Murray Oldfield · Nov 25, 2016 go to post

Hi Timur, thanks for the comments and links. I agree, 3D XPoint is a case of waiting to see real performance when it's released. Even 10x lower latency is still a big jump - the figures in the post are what is publicly talked about by Micron now. My aim is to give people a heads up on what's coming and to look out for it (although vendors will be shouting it from the rooftops :) Hopefully we will have some real data and pricing soon.

Murray Oldfield · Nov 22, 2016 go to post

Hi, back of the envelope logic is like this:

For the old server you have 8 cores. Assuming the workload does not change:

Each core on the new server is capable of about 25% more processing throughput. Or another way; each core of the old server is capable of of about 75% processing throughput of the new server. So roughly (8 *.75) old cores equates to about 6 cores on the new server.

You will have to confirm how your application behaves, but if you are using the calculation to work out how much you can consolidate applications on a new virtualized server you can get a good idea what to expect. If it is virtualized you can also right-size after monitoring to fine tune if you have to.

Murray Oldfield · Oct 25, 2016 go to post

Hi, if the question is how to format then I use the three tildes on the line immediately before and after the block of code "~~~" eg. The following code block had the tildes fencing the code on the line immediately before and immediately after the block.

eg.

//Get PDF stream object previously saved
   Set pdfStreamContainer = 
etc
etc

~~~ //Get PDF stream object previously saved    Set pdfStreamContainer = ##Class(Ens.StreamContainer).%OpenId(context.StreamContainerID)    Try {      Set pdfStreamObj = pdfStreamContainer.StreamGet()    }    Catch {       $$$TRACE("Error opening stream object ID = "_context.StreamContainerID)       Quit    }  //Set PDF stream object into new OBX:5 segment    Set status = target.StoreFieldStreamBase64(pdfStreamObj,"OBX("_count_"):ObservationValue(1)")    Set ^testglobal("status",$H) = status ~~~
I note that the engine for the community website should automatically highlight known languages but you can override .

eg. On first line before "~~~javascript" or "~~~python"


~~~javascript var s = "JavaScript syntax highlighting"; alert(s); ~~~
~~~python s = "Python syntax highlighting" print s ~~~