Clear filter
Announcement
Andreas Schneider · Feb 17, 2016
Caché Monitor is a database\sql tool primarily for InterSystems Caché but can also connect to MS SQL Server, MS Access and more databases. Within Caché Monitors Server Navigator you see all available Namespaces on your Caché Servers. No need to know the name of the Namespace, no need to configure many many JDBC Connections by hand. Just click on the namespace and see all objects like tables, views, classes and more...There is a beta build available with some new features: A main new feature in this build is called Query Cloud. With this feature you can write SQL Statements across multiple Caché Servers; Namespace and combine (SQL JOIN!) this data with other datasources like SQL Server; MS Access or simple CSV files. All this is done with zero installation on the Server side. More details: http://www.cachemonitor.de/cache-monitor-beta-releases/Please watch this video to see how it works. This video demonstrates how you can work with CSV files within the Query Cloud and query the data like database tables.Keep in mind please: all this is done locally and maby not the right way to work with very large tables with millions of rows. But maybe it is the right thing to make adhoc queries and analyse the data before you go on and export\import your data to combine it in one namespace for analysis purposes.An evaluation license is attached to this post. It would be great if you make some tests within your environment and play with this feature. I'm very interested in getting feedback (email preferred). Thanks for your time! Thank you Andreas! I used Caché Monitor many years ago, it is great! Server Navigator link doesn't work, would you please fix it? Evgeny thanks for the kind words and the hint about broken link!Andreas
Question
Tom Longmoore · Jun 22, 2016
My manager wants to send a couple of people to one of InterSystems's courses about developing Ensemble productions. I work in a healthcare setting, but my group does not do much work with HL7 interfaces. We mainly use Ensemble to implement custom (non-HL7) interfaces and web services/clients.With this in mind, which of the two available courses would make the most sense for us - Building Healthcare Productions or Building Business Productions? Has anyone taken one or both and, if so, which would you recommend? Tom,It's been a while since I took it but if I recall correctly you would want "Building Business Productions" which is basically the "Building Healthcare Productions" course with a day of HL7 content removed.HTH,Ben
Article
Murray Oldfield · Nov 25, 2016
Hyper-Converged Infrastructure (HCI) solutions have been gaining traction for the last few years with the number of deployments now increasing rapidly. IT decision makers are considering HCI when scoping new deployments or hardware refreshes especially for applications already virtualised on VMware. Reasons for choosing HCI include; dealing with a single vendor, validated interoperability between all hardware and software components, high performance especially IO, simple scalability by addition of hosts, simplified deployment and simplified management.
I have written this post with an introduction for a reader who is new to HCI by looking at common features of HCI solutions. I then review configuration choices and recommendations for capacity planning and performance when deploying applications built on InterSystems data platform with specific examples for database applications. HCI solutions rely on flash storage for performance so I also include a section on characteristics and use cases of selected flash storage options.
Capacity planning and performance recommendations in this post are specific to _VMWare vSAN_. However vSAN is not alone in the growing HCI market, there are other HCI vendors, notably _Nutanix_ which also has an increasing number of deployments. There is a lot of commonality between features no matter which HCI vendor you choose so I expect the recommendations in this post are broadly relevant. But the best advice in all cases is to discuss the recommendations from this post with HCI vendors taking into account your application specific requirements.
[A list of other posts in the InterSystems Data Platforms and performance series is here.](https://community.intersystems.com/post/capacity-planning-and-performance-series-index)
# What is HCI?
Strictly speaking converged solutions have been around for a long time, however in this post I am talking about current HCI solutions for example from [Wikipedia:](https://en.wikipedia.org/wiki/Hyper-converged_infrastructure) "Hyperconvergence moves away from multiple discrete systems that are packaged together and evolve into __software-defined__ intelligent environments that all run in __commodity, off-the-shelf x86 rack servers__...."
## So is HCI a single thing?
No. When talking to vendors you must remember HCI has many permutations; Converged and Hyper-converged are more a type of architecture not a specific blueprint or standard. Due to the commodity nature of HCI hardware the market has multiple vendors differentiating themselves at the software layer and/or other innovative ways of combining compute, network, storage and management.
Without going down too much of a rat hole here, as an example solutions labeled HCI can have storage inside the servers in a cluster or have more traditional configuration with a cluster of servers and separate SAN storage -- possibly from different vendors -- that has also been tested and validated for interoperability and managed from a single control plane. For capacity and performance planning you must consider solutions where storage is in an array connected over a SAN fabric (e.g. Fibre Channel or Ethernet) have a different performance profile and requirements to the case where the storage pool is software defined and located inside each of a cluster of server nodes with storage processing on the servers.
## So what is HCI again?
For this post I am focusing on HCI and specifically _VMware vSAN_ where _storage is physically inside the host servers_. In these solutions the HCI software layer enables the internal storage in each of multiple nodes in a cluster performing processing to act like one shared storage system. So another driver of HCI is even though there is a cost for HCI software there could also be significant savings using HCI when compared to solutions using enterprise storage arrays.
>For this post I am talking about solutions where HCI combines compute, memory, storage, network and management software into a cluster of virtualised x86 servers.
## Common HCI characteristics
As mentioned above _VMWare vSAN_ and _Nutanix_ are examples of HCI solutions. Both have similar high level approaches to HCI and are good examples of the format:
- _VMware vSAN_ requires VMware vSphere and is available on multiple vendors hardware. There are many hardware choices available but these are strictly dependent on VMware's vSAN Hardware Compatibility List (HCL). Solutions can be purchased prepackaged and preconfigured for example EMC VxRail or you can purchase components on the HCL and build-your-own.
- _Nutanix_ can also be purchased and deployed as an all-in-one solution including hardware in preconfigured blocks with up to four nodes in a 2U appliance. Nutanix solution is also available as a build-your-own software solution validated on other vendors hardware.
There are some variations in implementation, but generally speaking HCI have common features that will inform your planning for performance and capacity:
- Virtual Machines (VMs) run on hypervisors such as VMware ESXi but also others including Hyper-V or Nutanix Acropolis Hypervisor (AHV). Nutanix can also be deployed using ESXi.
- Host servers are often combined into blocks of compute, storage and network. For example a 2U Appliance with four nodes.
- Multiple host servers are combined into a cluster for management and availability.
- Storage is tiered, either all-flash or a hybrid with a flash cache tier plus spinning disks as a capacity tier.
- Storage is presented as a pool which is software defined including data placement and policies for capacity, performance and availability.
- Capacity and IO performance are scaled by adding hosts to the cluster.
- Data is written to storage on multiple cluster nodes synchronously so the cluster can tolerate host or component failures without data loss.
- VM availability and load balancing is provided by the hypervisor for example vMotion, VMware HA, and DRS.
As I noted above there are also other HCI solutions with twists on this list such as support for external storage arrays, storage only nodes... the list is a long as the list of vendors.
HCI adoption is gathering pace and competition between the vendors is driving innovation and performance improvements. It is also worth noting that HCI is a basic building block for cloud deployment.
# Are InterSystems' products supported on HCI?
It is InterSystems policy and procedure to verify and release InterSystems’ products against processor types and operating systems including when operating systems are virtualised. Please note [InterSystems Advisory: Software Defined Data Centers (SDDC) and Hyper-Converged Infrastructure (HCI)](https://www.intersystems.com/product-alerts-advisories/advisory-software-defined-data-centers-sddc-and-hyper-converged-infrastructure-hci).
For example: Caché 2016.1 running on Red Hat 7.2 operating system on vSAN on x86 hosts is supported.
Note: If you do not write your own applications you must also check your application vendors support policy.
# vSAN Capacity Planning
This section highlights considerations and recommendations for deployment of _VMware vSAN_ for database applications on InterSystems data platforms -- Caché, Ensemble and HealthShare. However you can also use these recommendations as a general list of configuration questions for reviewing with any HCI vendor.
## VM vCPU and memory
As a starting point use the same capacity planning rules for your database VMs' vCPU and memory as you already use for deploying your applications on VMware ESXi with the same processors.
As a refresher for general CPU and memory sizing for Caché a list of other posts in this series is here: [Capacity planning and performance series index.](https://community.intersystems.com/post/capacity-planning-and-performance-series-index)
One of the features of HCI systems is very low storage IO latency and high IOPS capability. You may remember from the 2nd post in this series the [hardware food groups graphic](https://dl.dropboxusercontent.com/u/25822386/InterSystems/performance2/foodGroups.png) showing CPU, memory, storage and network. I pointed out that these components are all related to each other and changes to one component can affect another, sometimes with unexpected consequences. For example I have seen a case of fixing a particularly bad IO bottleneck in a storage array caused CPU usage to jump to 100% resulting in even worse user experience as the system was suddenly free to do more work but did not have the CPU resources to service increased user activity and throughput. This effect is something to bear in mind when you are planning your new systems if your sizing model is based on performance metrics from less performant hardware. Even though you will be upgrading to newer servers with newer processors your database VM activity must be monitored closely in case you need to right-size due to lower latency IO on the new platform.
Also note, as detailed later you will also have to account for software defined storage IO processing when sizing _physical host_ CPU and memory resources.
## Storage capacity planning
To understand storage capacity planning and put database recommendations in context you must first understand some basic differences between vSAN and traditional ESXi storage. I will cover these first then break down all the best practice recommendations for Caché databases.
### vSAN storage model
At the heart of vSAN and HCI in general is software defined storage (SDS). The way data is stored and managed is very different to using a cluster of ESXi servers and a shared storage array. One of the advantages of HCI is there are no LUNs, instead pool(s) of storage that are allocated to VMs as needed with policies describing capabilities for availability, capacity, and performance per-VMDK.
For example; imagine a traditional storage array consisting of shelves of physical disks configured together as various sized disk groups or disk pools with different numbers and/or types of disk depending on performance and availability requirements. Disk groups are then presented as a number of logical disks (storage array volumes or LUNs) which are in turn presented to ESXi hosts as datastores and are formatted as VMFS volumes. VMs are represented as files in the datastores. Database best practice for availability and performance recommends at minimum separate disk groups and LUNs for database (random access), journals (sequential), and any others (such as backups or non-production systems, etc).
vSAN is different; storage from the vSAN is allocated using storage policy-based management (SPBM). Policies can be created using combinations of capabilities, including the following (but there are more);
- Failures To Tolerate (FTT) which dictates the number of redundant copies of data.
- Erasure coding (RAID-5 or RAID-6) for space savings.
- Disk stripes for performance.
- Thick or thin disk provisioning (thin by default on vSAN).
- Others...
VMDKs (individual VM disks) are created from the vSAN storage pool by selecting appropriate policies. So instead of creating disk groups and LUNs on the array with a set attributes, you define the capabilities of storage as policies in vSAN using SPBM; for example "Database" would be different to "Journal", or whatever others you need. You set the capacity and select the appropriate policy when you create disks for your VM.
Another key concept is a VM is no longer a set of files on a VMDK datastore but is stored as a set of _storage objects_. For example your database VM will be made up of multiple objects and components including the VMDKs, swap, snapshots, etc. vSAN SDS manages all the mechanics of object placement to meet the requirements of the policies you selected.
### Storage tiers and IO performance planning
To ensure high performance there are two tiers of storage;
- Cache tier - Must be high endurance flash.
- Capacity tier - Flash or for hybrid uses spinning disks.
As shown in the graphic below storage is divided into tiers and disk groups. In vSAN 6.5 each disk group includes a single cache device and up to seven spinning disks or flash devices. There can be up to five disk groups so possibly up to 35 devices per host. The figure below shows an all-flash vSAN cluster with four hosts, each host has two disk groups each with one NVMe cache disk and three SATA capacity disks.
_Figure 1. vSAN all-flash storage showing tiers and disk groups_
When considering how to populate tiers and the _type_ of flash for cache and capacity tiers you must consider the IO path; for the lowest latency and maximum performance writes go to the cache tier then software coalesces and de-stages the writes to the capacity tier. Cache use depends on deployment model, for example in vSAN hybrid configurations 30% of the cache tier is write cache, in the case of all-flash 100% of cache tier is write cache -- reads are from low latency flash capacity tier.
There will be a performance boost using all-flash. With larger capacity and durable flash drives available today the time has come where you should be considering whether you need spinning disks. The business case for flash over spinning disk has been made over recent years and includes much lower cost/IOPS, performance (lower latency), higher reliability (no moving parts to fail, less disks to fail for required IOPS), lower power and heat profile, smaller footprint, and so on. You will also benefit from additional HCI features, for example vSAN will only allow deduplication and compression on all-flash configurations.
- **_Recommendation:_** For best performance and lower TCO consider all-flash.
For best performance the cache tier should have the lowest latency, especially for vSAN as there is only a single cache device per disk group.
- **_Recommendation:_** If possible choose NVMe SSDs for the cache tier although SAS is still OK.
- **_Recommendation:_** Choose high endurance flash devices in the cache tier to handle high I/O.
For SSDs at the capacity tier there is negligible performance difference between SAS and SATA SSDs. You do not need to incur the cost of NVMe SSD at the capacity tier for database applications. However in all cases ensure you are using enterprise class SATA SSDs with features such as power failure protection.
- **_Recommendation:_** Choose high capacity SATA SSDs for capacity tier.
- **_Recommendation:_** Choose enterprise SSDs with power failure protection.
Depending on your timetable new technologies such as such as 3D Xpoint with higher IOPS, lower latency, higher capacity and higher durability may be available. There is a breakdown of flash storage at the end of this post.
- **_Recommendation:_** Watch for new technologies to include such as 3D Xpoint for cache AND capacity tier.
As I mentioned above you can have up to five disk groups per host and a disk group is made up of one flash device and up to seven devices at the capacity tier. You could have a single disk group with one flash device and as much capacity as you need, or multiple disk groups per host. There are advantages to having multiple disk groups per host:
- Performance: Having multiple flash devices at the tiers will increase the IOPS available per host.
- Failure domain: Failure of a cache disk impacts the entire disk group, although availability is maintained as vSAN rebuilds automatically.
You will have to balance availability, performance and capacity, but in general having multiple disk groups per host is a good balance.
- **_Recommendation:_** Review storage requirements, consider multiple disk groups per host.
#### What performance should I expect?
A key requirement for good application user experience is low storage latency; the usual recommendation is that database read IO latency should be below 10ms. [Refer to the table from Part 6 of this series here for details.](https://community.intersystems.com/post/data-platforms-and-performance-part-6-cach%C3%A9-storage-io-profile)
For Caché database workloads tested using the default vSAN storage policy and Caché [RANREAD utility](https://community.intersystems.com/post/random-read-io-storage-performance-tool) I have observed sustained 100% random read IO over 30K IOPS with less than 1ms latency for all-flash vSAN using Intel S3610 SATA SSDs at the capacity tier. Considering that a basic rule of thumb for Caché databases is to size instances to [use memory for as much database IO as possible](https://community.intersystems.com/post/intersystems-data-platforms-and-performance-part-4-looking-memory) all-flash latency and IOPS capability should provide ample headroom for most applications. Remember memory access times are still orders of magnitude lower than even NVMe flash storage.
As always remember your mileage will vary; storage policies, number of disk groups and number and type of disks etc will influence performance so you must validate on your own systems!
## Capacity and performance planning
You can calculate the raw TB capacity of a vSAN storage pool roughly as the total size of disks in the capacity tier. In our example configuration in _figure 1_ there are a total of 24 x INTEL S3610 1.6TB SSDs:
>Raw capacity of cluster: 24 x 1.6TB = 38.4 TB
However _available_ capacity is much different and where calculations get messy and is dependent on configuration choices; which policies are used (such as FTT which dictates how many copies of data) and also whether deduplication and compression have been enabled.
I will step through selected policies and discuss their implications for capacity and performance and recommendations for a _database workload_.
All ESXi deployments I see are made up of multiple VMs; for example, TrakCare a unified healthcare information system built on InterSystems’ health informatics platform, HealthShare is at its heart at least one large (monster) database server VM which is absolutely fits the description "tier-1 business critical application". However a deployment also includes combinations of other single purpose VMs such as production web servers, print servers, etc. As well as test, training and other non-production VMs. Usually all deployed in a single ESXi cluster. While I focus on database VM requirements remember that SPBM can be tailored per VMDK for all your VMs.
### Deduplication and Compression
For vSAN deduplication and compression is a cluster-wide on/off setting. Deduplication and compression can only be enabled when you are using an all-flash configuration. Both features are enabled together.
At first glance deduplication and compression seems to be a good idea - you want to save space, especially if you are using (more expensive) flash devices at the capacity tier. While there are space savings with deduplication and compression my recommendation is that you do not enable this feature for clusters with large production databases or where data is constantly being overwritten.
Deduplication and compression does add some processing overhead on the host, maybe in the range of single digit %CPU utilization, but this is not the primary reason not recommending for databases.
In summary vSAN attempts to deduplicate data as it is written to the capacity tier within the scope of a single disk group using 4K blocks. So in our example at _figure 1_ data objects to be deduplicated would have to exists in the capacity tier of the same disk group. I am not convinced we will see much savings on Caché database files which are basically very large files filled with 8K database blocks with unique pointers, contents, etc. Secondly vSAN will only attempt to compress duplicated blocks, and will only consider blocks compressed if compression reaches 50% or more. If the deduplicated block does not compress to 2K it is written uncompressed. While there may be some duplication of operating system or other files _the real benefit of deduplication and compression would be for clusters deployed for VDI_.
Another caveat is the impact of a (albeit rare) failure of one device in a disk group on the whole group when deduplication and compression is on. The whole disk group is marked "unhealthy" which has a cluster wide impact: because the group is marked unhealthy all the data on a disk group will be evacuated off that group to other places, then the device must be replaced and vSAN will resynchronise the objects to rebalance.
- **_Recommendation:_** For database deployments do not enable compression and deduplication.
>_**Sidebar: InterSystems database mirroring.**_
> For mission critical tier-1 Caché database application instances requiring the highest availability I recommend [InterSystems synchronous database mirroring, even when virtualised.](http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=GHA_mirror#GHA_mirror_set_bp_vm) Virtualised solutions have HA built in; for example VMWare HA, however additional advantages of also using mirroring include:
>- Separate copies of up-to-date data.
- Failover in seconds (faster than restarting a VM then operating System then recovering Caché).
- Failover in case of application/Caché failure (not detected by VMware).
>I am guessing you have spotted the flaw in enabling deduplication when you have mirrored databases on the same cluster? You will be attempting to deduplicate your mirror data. Generally not sensible and also a processing overhead.
>Another consideration when deciding whether to mirror databases on HCI is the total storage capacity required. vSAN will be making multiple copies of data for availability, this data storage will be doubled again by mirroring. You will need to weigh the small incremental increase in uptime over what VMware HA provides against the additional cost of storage.
>For maximum uptime you can create two clusters so that each node of the database mirror is in a completely independent failure domain. However take note of the total servers and storage capacity to provide this level of uptime.
## Encryption
Another consideration is where you choose to encrypt data at rest. You have several choices in the IO stack including;
- Using Caché database encryption (encrypts database only).
- At Storage (e.g. hardware disk encryption at SSD).
Encryption will have a very small impact on performance, but can have a big impact on capacity if you choose to enable deduplication or compression in HCI. If you do choose deduplication and/or compression you would not want to be using Caché database encryption because it would negate any gains as encrypted data is random by design and does not compress well. Consider the protection point or risk they are trying to protect from, for example theft of file vs. theft of device.
- **_Recommendation:_** Encrypt at the lowest layer as possible in the IO stack for a minimal level of encryption. However the more risk you want to protect move higher up the stack.
### Failures To Tolerate (FTT)
FTT sets a requirement on the storage object to tolerate at least _n_ number of concurrent host, network, or disk failures in the cluster and still ensure the availability of the object. The default is _1_ (RAID-1); the VM’s storage objects (e.g. VMDK) are mirrored across ESXi hosts.
>So vSAN configuration must contain at least n + 1 replicas (copies of the data) which also means there are 2n + 1 hosts in the cluster.
For example to comply with a number of failures to tolerate = 1 policy, you need three hosts at a minimum at all times -- even if one host fails. So to account for maintenance or other times when a host is taken off-line you need four hosts.
- **_Recommendation:_** A vSAN cluster must have a minimum four hosts for availability.
Note there is also exceptions; a Remote Office Branch Office (ROBO) configuration that is designed for two hosts and a remote witness VM.
### Erasure Coding
The default storage method on vSAN is RAID-1 -- data replication or mirroring. Erasure coding is RAID-5 or RAID-6 with storage objects/components distributed across storage nodes in the cluster. The main benefit of erasure coding is better space efficiency for the same level of data protection.
Using the calculation for FTT in the previous section as an example; for a VM to tolerate _two_ failures using a RAID-1 there must be three copies of storage objects meaning a VMDK will consume 300% of the base VMDK size. RAID-6 also allows a VM to tolerate two failures and only consumes 150% the size of the VMDK.
The choice here is between performance and capacity. While the space saving is welcome you should consider your database IO patterns before enabling erasure coding. Space efficiency benefits come at the price of the amplification of I/O operations which is higher again during times of component failure so for best database performance use RAID-1.
- **_Recommendation:_** For production databases do not enable erasure coding. Enable for non-production.
Erasure coding also impacts the number of hosts required in your cluster. For for example for RAID-5 you need a minimum of four nodes in the cluster, for RAID-6, you need a minimum of six nodes.
- **_Recommendation:_** Consider the cost of additional hosts before planning to configure erasure coding.
### Striping
Striping offers opportunity for performance improvements but will likely only help with hybrid configurations.
- **_Recommendation:_** For production databases do not enable striping.
### Object Space Reservation (thin or thick provisioning)
The name for this setting comes from vSAN using objects to store components of your VMs (VMDKs etc). By default all VMs provisioned to a VSAN datastore have object space reservation of 0% (thin provisioned) which leads to space savings and also enables vSAN more freedom for placement of data. However for your production databases best practice is to use 100% reservation(thick provisioned) where space is allocated at creation. For vSAN this will be Lazy Zeroed – where 0’s are written as each block is first written to. There are a few reasons for choosing 100% reservation for production databases; there will be less delay when database expansions occur, and you are guaranteeing that storage will be available when you need it.
- **_Recommendation:_** For production database disks use 100% reservation.
- **_Recommendation:_** For non-production instances leave storage thin provisioned.
### When should I turn on features?
You can generally enable availability and space saving features after using the systems for some time, that is; when there are active VMs and users on the system. However there will be performance and capacity impact. Additional replicas of data in addition to the original are needed so additional space is required while data is synchronised. My experience is that enabling these type of features on clusters with large databases can take a very long time and expose the possibility of reduced availability.
- **_Recommendation:_** Spend time up front to understand and configure storage features and functionality such as deduplication and compression before go-live and definitely before large databases are loaded.
There are other considerations such as leaving free space for disk balancing, failure etc. The point is you will have to take into account the recommendations in this post with vendor specific choices to understand your raw disk requirements.
- **_Recommendation:_** There are many features and permutations. Work out your total GB capacity requirements as a starting point, review recommendations in this post [and with your application vendor] then talk to your HCI vendor.
## Storage processing overhead
You must consider the overhead of storage processing on the hosts. Storage processing otherwise handled by the processors on an enterprise storage array is now being computed on each host in the cluster.
The amount of overhead _per host_ will be dependent on workload and what storage features are enabled. My observations with basic testing I have done with Caché on vSAN shows that processing requirements are not excessive, especially when you consider the number of cores available on current servers. VMware recommends planning for 5-10% host CPU usage
The above can be a starting point for sizing but _remember your mileage will vary_ and you will need to confirm.
- **_Recommendation:_** Plan for worst case of 10% CPU utilisation and then monitor your real workload.
## Network
Review vendor requirements -- assume minimum 10GbE NICs -- multiple NICs for storage traffic, management (e.g. vMotion), etc. I can tell you from painful experience that an enterprise class network switch is required for optimal operation of the cluster -- after all - all writes are sent synchronously over the network for availability.
- **_Recommendation:_** Minimum 10GbE switched network bandwidth for storage traffic. Multiple NICs per host as per best practice.
# Flash Storage Overview
Flash storage is a requirement of HCI so it is good to review where flash storage is today and where its going in the near future.
_The short story is whether you use HCI or not if you are not deploying your applications using storage with flash today it is likely that your next storage purchase will include flash._
## Storage today and tomorrow
Let us review the capabilities of commonly deployed storage solutions and be sure we are clear with the terminology.
**Spinning disk**
- Old faithful. 7.2, 10K or 15K HDD spinning disks with SAS or SATA interface. Low IOPS per disk. Can be high capacity but that means the IOPS per GB are decreasing. For performance typically data is striped across multiple disks to achieve 'just enough' IOPS with high capacity.
**SSD disk - SATA and SAS**
- Today flash is usually deployed as SAS or SATA interface SSDs using NAND flash. There is also some DRAM in the SSD as a write buffer. Enterprise SSDs include power loss protection - in event of power failure contents of DRAM are flushed to NAND.
**SSD disk - NVMe**
- Similar to SSD disk but uses NVMe protocol (not SAS or SATA) with NAND flash. NVMe media attach via PCI Express (PCIe) bus allowing the system to talk directly without the overhead of host bus adapters and storage fabrics resulting in much lower latency.
**Storage Array**
- Enterprise Arrays provide protection and the ability to scale. It is more common today that storage is either a hybrid array or all-flash. Hybrid arrays have a cache tier of NAND flash plus one or more capacity tiers using 7.2, 10K or 15K spinning disks. NVMe arrays are also becoming available.
**Block-Mode NVDIMM**
- These devices are shipping today and are used when extremely low latencies are required. NVDIMMs sit in a DDR memory socket and provide latencies around 30ns. Today they ship in 8GB modules so are not likely to be used for legacy database applications, but new scale-out applications may take advantage of this performance.
**3D XPoint**
_This is a future technology - not available in November 2016._
- Developed by Micron and Intel. Also known as **Optane** (Intel) and **QuantX** (Micron).
- Will not be available until at least 2017 but compared to NAND promises higher capacity, >10x more IOPS, >10x lower latency with extremely high Endurance and consistent performance.
- First availability will use NVMe protocol.
## SSD device Endurance
SSD device _endurance_ is an important consideration when choosing drives for cache and capacity tiers. The short story is that flash storage has a finite life. Flash cells in an SSD can only be deleted and rewritten a certain number of times (no restrictions apply to reads). Firmware in the device manages spreading writes around the drive to maximise the life of the SSD. Enterprise SSDs also typically have more real flash capacity than visible to achieve longer life (over-provisioned), for example an 800GB drive may have more than 1TB of flash.
The metric to look for and discuss with your storage vendor is full Drive Writes Per Day (DWPD) guaranteed for a certain number of years. For example; An 800GB SSD at 1 DWPD for 5 years can have 800GB per day written for 5 years. So the higher the DWPD (and years) the higher the endurance. Another metric simply switches the calculation to show SSD devices specified in Terabytes Written (TBW); The same example has TBW of 1,460 TB (800GB * 365 days * 5 years). Either way you get an idea of the life of the SSD based on your expected IO.
# Summary
This post covers the most important features to consider when deploying HCI and specifically VMWare vSAN version 6.5. There are vSAN features I have not not covered, if I have not mentioned a feature assume you should use the defaults. However if you have any questions or observations I am happy to discuss via the comments section.
I expect to return to HCI in future posts, this certainly is an architecture that is on the upswing so I expect to see more InterSystems customers deploying on HCI.
Very useful article, thanks, but at the moment I'd very cautious recommending Intel/Micron 3D XPoint memory technology. It looks like the real numbers are very far from the original claim (especially endurance improvements) - http://semiaccurate.com/2016/09/12/intels-xpoint-pretty-much-broken/ OTOH their performance numbers are very impressive, even today (especially for Micron part) - http://www.tomshardware.com/reviews/3d-xpoint-guide,4747-6.html Hi Timur, thanks for the comments and links. I agree, 3D XPoint is a case of waiting to see real performance when it's released. Even 10x lower latency is still a big jump - the figures in the post are what is publicly talked about by Micron now. My aim is to give people a heads up on what's coming and to look out for it (although vendors will be shouting it from the rooftops :) Hopefully we will have some real data and pricing soon. Thanks Murray for these 'new technology catch-up' articles, especially this part, 9 and 10. Bob alerted me to these. I do have HCI deployment in the horizon based on ESXi and EMC scaleio all-flash (both cache and capacity tiers) architecture. I will keep this in mind when we finally meet the vendors of the HCI kit.In the article you mentioned "you define the capabilities of storage as policies in vSAN using SPBM; for example "Database" would be different to "Journal". I was hoping to see specific policies for these further down the article?? (well if you consider i'm from traditional arrays where we normally pay attention to these).Regards;Anzelem. Hi, It would have been better to say "Database" could be different to "Journal". SPBM can be different for all VMDKs (disks), but that doesn't mean it should be. As an example on a four node all flash cluster;I am using just two storage policies for a production VM disks.- Policy: VSAN default: for OS and backup/scratch. Failures to tolerate (FTT)=1. Disk stripes=1. Object space reservation=0.- Policy: Production: for database and journals. FTT=1. Disk stripes=1. Object space reservation=100.For performance on each production VM use separate PVSCI adapters for OS, journal, database, and backup/scratch. For non-production VMs I have a policy that makes better use of available capacity, there is still HA in the storage:- Policy: Non-Production: for all data disks. FTT method=RAID 5/6 (Erasure coding). FTT=1. Object space reservation=0.Note. These values are not written in stone and will depend on your requirements. While you need to think about performance it should be great out of the box. What you must also consider is availability and capacity. Hi Murray;Have you tested/worked on HCI with VMware and ScaleIO?Regards;Anzelem. Hi Anzelem, no. But I m very interested to hear of experience community readers with any HCI solution. Either through the comments or directly.
Announcement
Janine Perkins · Nov 28, 2016
Learn how to work with DICOM Modality Worklists in a HealthShare Health Connect production.DICOM is a global information technology standard for handling medical images. We will cover the parts of a HealthShare Health Connect production that are at work when handling DICOM requests, and we will use the built-in demo production to simulate communication with an external modality. Learn More. I'm looking for any online videos you have demonstrating Health Insight, not specificaly DICOM Modality worklists.Thank You.Glenn Mamarygmamary@j2interactive.com Glenn,We are working on some more detailed courses on Health Insights. Right now we have a Health Insight Resource Guide that can start you off with an Overview and provides access to the documentation as long are a customer/partner. Hi,the link seems to be broken. Any chance to get this article back up?best regards,sebastian Digging the intersystems learning facility the course can also be found through the learning catalogue on learning.intersystems.com.best regards,Sebastian
Announcement
Anastasia Dyubaylo · May 7, 2021
Hey developers,
The registration period for the FHIR Accelerator programming contest is in full swing! We invite all FHIR developers to build new or test existing applications using the InterSystems IRIS FHIR Accelerator Service (FHIRaaS) on AWS.
And now you have a great opportunity to get FREE access to the FHIRaaS on AWS! So, to take the first step in mastering the FHIRaaS, you need to register on our FHIR Portal using this link:
👉🏼 https://portal.trial.isccloud.io/account/signup
Just follow the link above and become a master of the FHIRaaS with InterSystems! ✌🏼
Feel free to ask any questions regarding the competition here or in the discord-contests channel.
Happy coding! We will be ready to start sharing the access codes for the FHIRaaS portal starting from Thursday 14th of May. Please refer to @Irina.Podmazko in Direct Message or reply to this post or request in Discord! Some updates:
Now you can easily get your FREE access to InterSystems IRIS FHIR Accelerator Service (FHIRaaS) on AWS! Just follow this link and register on the ISC Dev FHIR Portal:
👉🏼 https://portal.trial.isccloud.io/account/signup
An easy start to join our competition! Don't miss it!
Question
MohanaPriya Vijayan · May 28, 2021
Does InterSystems IRIS will support Visual Studio 6.0 Enterprise Edition (Visual Basic)?
We are in the process of transitioning Intersystems Cache 2017 to Intersystems IRIS 2020 version.
For terminal based applications we can able to use the same DAT file used for Cache with minor changes.
For Web based we are using Visual studio 6.0(Visual Basic). Will IRIS supports Visual Studio 6?
While I am configuring from that GUI application, I am getting the error like, Access Denied or No connection could be established. Same configuration followed in Cache that works. What the component did you use for connection?
There are a few technologies that was deprecated with IRIS, or under an additional license option. We are just using the namespace and its Base_TCP_Port(Superserver Port) for connection.
Announcement
Anastasia Dyubaylo · Jul 5, 2021
Hey Developers,
We have some good news for you:
💥 InterSystems AI contest participants can use Embedded Python in their solutions! So if you are not yet a member of the Embedded Python Early Access Program (EAP), now is the time!
Refer to python-interest@intersystems.com and you'll get FREE access to the InterSystems IRIS Embedded Python features.
In addition, we invite all EAP participants to the special Embedded Python kick-off webinar tomorrow, July 6 at 10:00 AM EDT – an easy start on how to use Embedded Python! Demonstration of the new features of the data platform, examples applications, and of course rewards.
After you become an EAP member, you will receive a special link to join the kick-off webinar:
➡️ RSVP: python-interest@intersystems.com
So,
Now any Python developer can easily join the current InterSystems AI contest!
Duration: June 28 - July 25, 2021
Total prize: $8,750
Don't miss it! Hey,
Become a member of the Early Access Program program and you will get access to a private EAP channel on our DC Discord Server!
Don't miss such an opportunity!
Note: Didn't find yourself in a private channel? Chat to me in Discord ;)
Article
Piyush Adhikari · Oct 19, 2022
I am demonstrating a use case of how we can create an IRIS Interoperability Production for special use in an external language. InterSystems IRIS, within Interoperability has a framework called Production Extension (PEX), using which we can create productions and program them as per their purpose using external languages like Java, Python etc, and also develop custom inbound and outbound adapters to communicate with other applications. Here in this demo, I will demonstrate a PEX framework-based production created by @Guillaume.Rongier7183 on Python, and I am using ‘task-specific’ business operations created by @Lucas.Enard2487 in that production to interoperate with a third party open source machine learning platform called HuggingFace, and bring its machine learning capabilities to InterSystems IRIS via PEX interoperability framework and Python.
Walkthrough video: https://www.loom.com/share/239a15e8c510406faac1bcdea8030d1d
Prerequisite
Docker and Docker Compose
Installation and testing
Clone the repository https://github.com/LucasEnard/Contest-Sustainability
Open the folder where the above repo is cloned, and open command prompt and start the docker container using the following command: docker-compose up
The container may start ‘unhealthy’ therefore, it is also a nice practice to start the container with the command: docker-compose up -d
Identify the port that the localhost is running on by using Docker command line and run the IRIS instance via the docker container
Open the production by clicking on the link as follows: http://localhost:port/csp/irisapp/EnsPortal.ProductionConfig.zen?RODUCTION=INFORMATION.QuickFixProduction
There are a few business operations, each for some HuggingFace models. Enable an operation and go to settings.
And on the ‘Python’ section, fill the boxes as per the model you would like to use.
Follow the content on the link as follows for reference: https://github.com/LucasEnard/Contest-Sustainability#521-settings
And click ‘Apply’.
Go to ‘Actions’ and run a test.
In the dialogue box that next appears, select request type as Grongier.PEX.Message.
And, the classname as msg.MLRequest.
And the JSON must include arguments needed by the model.
In the content on the link as follows, there are JSON content for some models: https://github.com/LucasEnard/Contest-Sustainability#522-testing
Click on ‘Invoke Testing Service’ to see test results.
Click on ‘Visual Trace’ and click on messages to see the response and content.
Exception/s
One exception I came across while running this demo on my machine was the docker container starting as ‘unhealthy’.
Adding the HEALTHCHECK statement as follows in the Dockerfile seems to fix the issue.
References
Lucas, E. (2022). Contest-Sustainability. [online] GitHub. Available at: https://github.com/LucasEnard/Contest-Sustainability [Accessed 20 Sep. 2022].
Enard, L. (n.d.). Sustainable Machine Learning for the InterSystems Interoperability Contest. [online] InterSystems Developer Community. Available at: https://community.intersystems.com/post/sustainable-machine-learning-intersystems-interoperability-contest [Accessed 20 Sep. 2022].
This is great @Piyush.Adhikari - nice example of InterSystems IRIS for Machine Learning! Hey @Piyush.Adhikari!
Congrats on your first contribution to the Developer Community! Welcome to the club :) Thank you for using my work and for the credit.
I'm glad you find a way to play and learn with it. Interesting Article, thanks for sharing. Nice tip re: HEALTHCHECK
Thanks @Piyush.Adhikari
Announcement
Anastasia Dyubaylo · Jan 28, 2020
Hi Community!
New "Coding Talk" video is already on InterSystems Developers YouTube:
⏯ How to Install and Use ObjectScript Package Manager with InterSystems IRIS
In this screencast, @Evgeny Shvarov describes how to install and use the ObjectScript Package Manager (ZPM) with InterSystems IRIS.
➡️ Download ZMP from Open Exchange
You can user InterSystems IRIS Community Edition to work with Package Manager: InterSystems IRIS on Docker
Packages we tested in the video:
ObjectScript Math
ZPM command: install objectscript-math
WebTerminal
ZPM command: install webterminal
Samples-BI
ZPM command: install samples-bi
DeepSeeWeb
ZPM command: install dsw
And...
You're very welcome to watch all Coding Talks in a dedicated "Coding Talks" playlist on our InterSystems Developers YouTube Channel.
Enjoy watching the video! 👍🏼
Announcement
Anastasia Dyubaylo · Mar 28, 2019
Hi Everyone!
Please meet InterSystems at hub.berlin - Europe's interactive business festival for digital movers and makers on 10 - 11 April 2019 in Berlin.
We look forward to two-day inspirational lectures and intensive technical discussions and invite you and your colleagues to our InterSystems booth for a personal conversation. In addition, we'll also present a keynote presentation and host a masterclass session.
See the details below.
InterSystems Keynote | 10 April 2019, 11:30 – 11:50
Interoperability enables the next wave of intelligent service-rich applications | Thomas Dyar
The next generation of digital solutions will be increasingly complex, as they leverage more data and a growing array of intelligent services. I will review how these trends shift the challenges from custom software and model development to integrating myriad services and ensuring they interoperate. With traditional programming environments and even dedicated low-code platforms, managing more than a few connected services becomes complex and unwieldy. Also, the volume, velocity and variety of data compounds these integration problems since traditional databases cannot efficiently provide transactional and analytic workload support at scale. Then we will see how developers in logistics, finance and healthcare fields are composing data-intensive, intelligent applications that overcome these obstacles, using new methodologies and platforms specifically designed for this latest phase of technology evolution.
More info here.
InterSystems Masterclass | 10 April 2019, 14:10 – 15:30
Learn how to build and scale intelligent service-rich applications with less custom code | @Benjamin.DeBoe, @Stefan.Wittmann, Thomas Dyar
The new landscape of intelligent services let you build and scale applications and services through integration and interoperability, while relying less on custom development. Thomas Dyar shows you how you can leverage the wide array of services available in your big data applications, highlighting the critical aspects of interoperability when combining multiple services that consume disparate types of data. Then you'll build an intelligent application using Spark for machine learning, and the InterSystems IRIS data platform for service coordination and data management, among other technologies.
What you'll learn, and how you can apply it?
Explore intelligent services concepts and best practices for designing analytic applications
Learn design patterns for ingesting, collecting, storing, analyzing, and visualizing big data
Build a data-intensive application using technologies such as InterSystems IRIS and SparkML
And...
If you still do not have a ticket for hub.berlin: with the promo code hb19-intersystems you get a 20% discount on the regular price of the ticket.
Register now and see you at the event!
Question
Evgeny Shvarov · Jul 23, 2019
Hi developers!Just want to check with you on best practices for that.You collaborate for InterSystems IRIS repository. You fork it, then make changes, commit, push, pull request, discuss(if any), your PR is accepted.What's next? Do you delete the repository you forked in? Something to note, if you delete the repo, the pull request will show up as "unknown repository"and any history attached to that repo will be lost. Also any references to it will of course be broken. But deleting the branch is encouraged by github and won't break any references. For me, I don't like broken links and references, but of course there's the argument of wanting a clean profile instead of 1000 old forked repos :) Good point David, thanks! Keep the fork for the next time you contribute. The issue is, if you keep it Github shows that it is "Behind the remote origin". How do you fix this? Or how do you deal with this? I typically leave it for some amount of time. I sometimes go through my repos and delete the stale forks.
Even though it does have the broken links back to the deleted repo, the PR merge will show the commit history in the new repo, which I think is the important part Neither alternatives. I'd usually __archive__ the repository instead.
Announcement
Andreas Dieckow · Oct 17, 2019
With the recent release of macOS 10.15, Apple has tightened its control mechanism , called Gatekeeper, so that it now requires executables to be notarized. InterSystems products are not currently supported for use on macOS 10.15 and the executables have not been notarized. (As a reminder, InterSystems products are supported on macOS as a development platform only.)
InterSystems is working to provide compatibility with macOS 10.15 for future releases of InterSystems IRIS, InterSystems IRIS for Health, Caché, and Ensemble. Until that time, we recommend not running InterSystems products on macOS 10.15. Another option is running InterSystems IRIS in a container on macOS including 10.15.
If you have any questions regarding this advisory, please contact the Worldwide Response Center.
Announcement
Anastasia Dyubaylo · Aug 14, 2019
Hi Community!New "Coding Talk" video is already on InterSystems Developers YouTube:How to Submit Your InterSystems Solution, Connector or Library to Open ExchangeIn this screencast, presented by @Evgeny Shvarov, you will know how to submit the GitHub application to InterSystems Open ExchangeLearn more about how to publish your applications on InterSystems Open Exchange in this post.And...You're very welcome to watch all Coding Talks in a dedicated "Coding Talks" playlist on our InterSystems Developers YouTube Channel.Stay tuned!
Article
Evgeny Shvarov · Jun 8, 2023
Hi Community!
Just want to share with you an exercise I made to create "my own" chat with GPT in Telegram.
It became possible because of two components on Open Exchange: Telegram Adapter by @Nikolay.Soloviev and IRIS Open-AI by @Francisco.López1549
So with this example you can setup your own chat with ChatGPT in Telegram.
Let's see how to make it work!
Prerequisites
Create a bot using @BotFather account and get the Bot Token. Then add bot into a telegram chat or channel and give it admin rights. Learn more at https://core.telegram.org/bots/api
Open (create if you don't have it) an account on https://platform.openai.com/ and get your Open AI API Key and Organization id.
Make sure you have IPM installed in your InterSystems IRIS. if not here is one liner to install:
USER> s r=##class(%Net.HttpRequest).%New(),r.Server="pm.community.intersystems.com",r.SSLConfiguration="ISC.FeatureTracker.SSL.Config" d r.Get("/packages/zpm/latest/installer"),$system.OBJ.LoadStream(r.HttpResponse.Data,"c")
Or you can use community docker image with IPM onboard like this:
$ docker run --rm --name iris-demo -d -p 9092:52797 -e IRIS_USERNAME=demo -e IRIS_PASSWORD=demo intersystemsdc/iris-community:latest
$ docker exec -it iris-demo iris session iris -U USER
USER>
Installation
Install the IPM package in a namespace with Interoperability enabled.
USER>zpm "install telegram-gpt"
Usage
Open the production.
Put your bot's Telegram Token into Telegram business service and Telegram Business operation both:
Also initialize St.OpenAi.BO.Api.Connect operation with your Chat GPT API key and Organization id:
Start the production.
Ask any question in the telegram chat. You'll get an answer via Chat GPT. Enjoy!
And in visual trace:
Details
This example uses 3.5 version of Chat GPT Open AI. It could be altered in the data-transformation rule for the Model parameter.
pic 1 didnt show up How about now? It looks like that organization field for Open AI integration is not mandatory, so only Telegram Token and ChatGPT key needed. Great!!! Good job Thank you, @Francisco.López1549! And thanks for introducing chatGPT package to the community! In a new version can also be installed as:
USER>zpm "install telegram-gpt -D TgToken=your_telegram_token -D GPTKey=your_ChatGPT_key"
so you can pass the Telegram Token and ChatGPT API keys as production parameters. A new version is coming soon... New features 😉 Looking forward!
Announcement
Jacquie Clermont · Nov 30, 2022
Hi Community:
Pleased to let you know that in Forrester's latest "Wave" report on analytical data platforms, we have been designated a "leader."
You can learn more from this InterSystems Press Release, or even better, read The Forrester Wave™: Translytical Data Platforms, Q4 2022.