Replies by Murray Oldfield for InterSystems Developer Community

Murray Oldfield · Oct 14, 2016

Hi, thanks for the comments.

Your right I had a bad choice of words... For ECP systems sustained throughput average write response time is for all journals -- because they are on the same disk.

I really can't give guidance on specific IO metrics you will have to validate for you own systems.

go to post

Murray Oldfield · Oct 3, 2016

Hi Alexey, the ECP post is up now.

https://community.intersystems.com/post/data-platforms-and-performance-p...

go to post

Murray Oldfield · Oct 2, 2016

I guess we are disappearing down a rat hole here... but it is good use Caché when you mean Caché especially as I also talk and email a lot about Caching as well. On mac i use textexpander for Caché and any number of other phrases on iPhone I use settings -> general -> keyboard -> text replacement...

go to post

Murray Oldfield · Sep 5, 2016

Just a hint... if you are running Linux you can use shell to quickly script a progression of runs for example starting with 2 processes stepping up in 2's to 30 processes.

for i in `seq 2 2 30`; do echo "do ##class(PerfTools.RanRead).Run(\"/db/RANREAD\",${i},10000)" | csession H2015 -U "%SYS"; done

Piping the command to csession requires that Operating-system–based authentication is enabled and your unix shell user exists in Caché.

See: Configuring for Operating-System–based Authentication in DocBook

go to post

Murray Oldfield · Jul 7, 2016

Hi, mirroring will not make much difference. Without ECP you may see 10's of writes per second on the database server journal disk (eg primary mirror). With ECP you will see perhaps 1000's of small writes per second on the data server (eg primary mirror) from journal synch activity, which is why you need such tight response time on journal disk when using ECP. My next posts will be about ECP and will cover this in more detail.

Right now I am on leave in Bali with only occasional Internet access until late July, so look for post early August.

go to post

Murray Oldfield · Jun 18, 2016

Hi,

Is there something unique in the ECP server? Can you create one OS image with Caché installed the ECP service started, correct mapping and the namespace etc as a template?

The application server will connect automatically to the named Data server on Caché startup?

So on Application server:

[ECPServers] MY_DATA_SERVER=dataserver_name.mydomain.com,1972,1

[Databases] MY_DB_NAME=:mirror:MIRROR_NAME:MY_DB_NAME,MY_DATA_SERVER

[Namespaces]

MY_NS=MY_DB_NAME,MY_DB_NAME

[Map.MY_NS] Global_STUFF*=MY_DB_NAME

go to post

Murray Oldfield · May 4, 2016

Thanks for adding your experience. Yes, your method for sizing per user process makes perfect sense, and that is how I did it when using client/server applications. I spend a lot of time now with a CSP (web) application which has less server processes per user so the calculations are different per user.

The same with memory so plentiful now 1023 MB is often the default for routine buffer. But smaller sites or small VMs may be adjusted down.

The 60/40 came about because of a need for sizing a new site, but I also like the idea of using a % for expected active data. In the end the best path is try and start in the ballpark with the rules we have... over time with constant monitoring adjust if/when needed.

Thanks again.
MO

go to post

Murray Oldfield · May 4, 2016

Q2. you also mention basic errors you made in configuring it, which were these? It might be helpful to mention the debugging facilities for snmp (^SYS("MONITOR","SNMP","DEBUG") ) as well?

A2. One problem was misconfiguring the security settings in snmpd.conf. Following the example above will get you there.

I also spun my wheels with what turned out to be a spelling (or case) error on the line agentXSocket tcp:localhost:705. In the end I figured out the problem was to do with agentX not starting by looking at the logs written to the install-dir/mgr/SNMP.log file. Caché logs any problems encountered while establishing a connection or answering requests in the SNMP.log. You should also check cconsole.log and the logs for snmpd in the OS.

On Windows, iscsnmp.dll logs any errors it encounters in %System%\System32\snmpdbg.log (on a 64–bit Windows system, this file is in the SysWOW64 subdirectory).

As pointed out in Fabian's question more information can be logged to the SNMP.log if you set ^SYS("MONITOR","SNMP","DEBUG")=1 in the %SYS namespace and restart the ^SNMP Caché subagent process. This logs details about each message received and sent.

Thanks for the questions
MO

go to post

Murray Oldfield · May 4, 2016

I was asked a couple of questions offline, so the following is to answer them:

Q1. In your article, why do you say it is necessary to change information strings in snmpd.conf? (ie. syslocation/syscontact)?

A1. What I mean is that you should change syslocation and syscontact to reflect your site, but leaving them as the defaults in the sample will not stop SNMP working using this sample snmpd.conffile.

go to post

Murray Oldfield · Mar 31, 2016

Hi Michael, Its possible to update configuration files, run scripts etc that could be used to configure a system. So the short answer is yes. At the Ensemble level investigate what you can do with Caché[/Ensemble/HealthShare] %installer as well. For example I choose to do Caché config with %installer, rather than edit the cpf file. I use both tools when configuring benchmark servers... I install web servers, Caché and configure the OS etc with Ansible then use %installer to do the final Caché and application level work such as creating databases, namespaces, global mappings, configuring global buffers etc. You can call Caché routines from the command line so once Caché is installed you can run any routine. I'll create a post about this.

I suggest you map out the steps you need to do then get familiar with the Ansible and Caché %installer actions in the respective docs. There is a post on the Community already about %installer.

Regards, MO

go to post

Murray Oldfield · Mar 26, 2016

Thanks for the comments Francis, I think Mark sums up what I was aiming for. The first round of posts is to introduce the major system components that affect performance, and you are right memory has a big role to play along with CPU and IO. There has to be a balance - to keep stretching the analogy good nutrition and peak performance is the result of a balanced diet. Certainly badly sized or configured memory will cause performance problems for any application, and with Java applications this is obviously a big concern. My next post is about capacity planning memory, so hopefully this will be useful - although I will be focusing more on the intersection with Caché. As Mark pointed out NUMA can also have influence performance, but there are strategies to plan for and mitigate the impact of NUMA which I will talk about in my Global Summit presentations, and which I will also cover in this series of posts.

Another aim in this series is to help customers who are monitoring their systems to understand what metrics are important and from that use the pointers in these posts to start to unpack whats going on with their application and why - and whether action needs to be taken. The best benchmark is monitoring and analysing your own live systems.

go to post

Murray Oldfield · Mar 14, 2016

There are several more aticles to come before we are done with storage IO

I will focus more on IOPS and writes in comming weeks. And will show some examples and solutions to the type of problem you mentioned.

Thanks, for the comment. I have quite a few more articles (in my head) for this series, I will be using the comments to help me decide which topics you all are interested in.

go to post

Murray Oldfield · Mar 14, 2016

The latest Caché documentation has details and examples for setting up read only or read/write asynchronous report mirror. The asynch reporting mirror is special because it is not used for high availability. For example it is not a DR server.

At the highest level running reports or extracts on a shadow is possible simply because the data exists on the other server in near real time. Operational or time-critcal reports should be run on the primary servers. The suggestion is that resource heavy reports or extracts can use the shadow or reporting server.

While setting up a shadow or reporting asynch mirror is part of Caché, how a report or extract is scheduled or run is an application design question, and not something I can answer - hopefully someone else can jump in here with some advice or experience.

Posibilities may include web services or if you use ODBC your application could direct queries to the shadow or a reporting asynch mirror. For batch reports or extracts routines could be scheduled on the shadow/reporting asynch mirror via task manager. Or you may have a sepearte application module for this type of reporting.

If you need to have results returned to the application on the primary production that is also application dependant.

You should also consider how to handle (e.g. via global mapping) any read/write application databases such as audit or logs which may be overwritten by the primary server.

If you are going to do reporting on a shadow server search the online documentation for special considerations for "Purging Cached Queries".

go to post

Murray Oldfield · Mar 14, 2016

Thanks! Yes I wrote the wrong way around in the post. I have fixed this now.

go to post

Murray Oldfield · Mar 1, 2016

Also the Ansible series. Here...