Replies by Sean Connelly for InterSystems Developer Community

Sean Connelly · Apr 21, 2017

Hi Nikita,

Sounds like an interesting plan.

I've developed my own desktop grade UI widget library. I used to use ExtJS but got fed up with the price and speed. I've got it to the stage where I can build the shell of an IDE that could be mistaken for a thick client installation. If you right click the image below and open in a new tab you will see that it has all the features you would expect of an IDE. Split panel editing, drag panels, accordions, trees, menus, border layouts etc, and an empty panel that needs a terminal!

I have syntax highlighting working for a variation of M that I have been working on for a few years. I can get the syntax highlighting working for COS no problem (well long form at least).

The hard stuff would be getting things like the studio inspector working like for like. Lots of back end COS required etc.

I've still got a few months working on an RPC messaging solution for the UI but once thats done I would be open to collaborating on a back-end implementation for Cache.

Sean.

Sean Connelly · Apr 21, 2017

I had some interesting results using just a TCP binding between node and cache.

With just one single node process and one single cache worker process I was able to process 1200 JSON RPC 2.0 messages per second. This included Cache de-serialising the JSON, calling its internal target method, writing and reading some random data and then passing back a new JSON object. Adding a second Cache process nearly doubled that throughput.

I was running node, cache and the stress test tool on the same desktop machine with lots of other programs running. I started to hit limits that seemed to be related to the test tool, so I wasn't sure how high I could take these benchmarks with this set up.

Interestingly when I bypass node and use a CSP page to handle the requests I could only get the same test set up to process 340 messages per second. This I couldn't understand. I am sure it was to do with the test tool, but could not work out how to get this higher. I would have expected Cache to spin up lots of processes and see more than the 1200 that were limited by one process.

It did make me wonder that no matter how many processes you have, you can only process 2 to 3 at a time per 4 CPU cores and that maybe Node was just much faster at dealing with the initial HTTP request handling, or that spreading the load between the two was a good symbiotic performance gain. Still I was not expecting such a big difference.

Now I would have thought, if you put Node.JS on one box and Cache on a second box so they don't compete for the resources they need most, that the TCP connection would be much more efficient than binding Node and Cache in the same process on the same box?

Sean Connelly · Apr 21, 2017

Good to know QEWD has been using web sockets for a long time. Socket.io is a well maintained library with all of the fall back features that I worry are missing with Cache sockets alone. Makes it a lot easier when putting Node.JS in front of Cache. I guess I just want to avoid any additional moving parts as I bash out as many web applications per year as I have users on some of them. That's why I never use the likes of Zen, I just need things simple and fast and typically avoid web sockets for fear of headaches using them. But, it is 2017 and we can only hope the NHS masses will soon move on to an evergreen browser.

Do you have benchmarks for QEWD. I have my own Node to Cache binding that I have developed for an RPC solution and I found I could get twice as much throughput of marshalled TCP messages compared to CSP requests. But then I can never be sure with these types of benchmarks unless put into production environment.

Sean Connelly · Apr 21, 2017

Hi Nikita,

Thanks for your detailed response. Good to hear a success story with WebSockets. Many of the problems that I have read about are very fringe. Most firewalls seem to allow the outgoing TCP sockets because they are running over 80 or 443, but it appears there are fringe cases of a firewall blocking the traffic. Also certain types of AV software can block. I suspect these problems are more prominent in the node.js community because node is more prevalent than Caché and that Caché is more likely to be inside the firewall with the end users.

The main problem I still have is that I work on Caché and Ensemble inside healthcare organisations in the UK and they are always behind on browser versions for various reasons. Only recently have I been able to stop developing applications that needed to work on IE6. Many are still on IE8 or IE9 (sometimes running in IE7 emulation mode). Either way websockets only work on IE10+. I can work around most browser problems with pollyfill's, but sockets require a client and server solution. That means you can't just drop in sockjs as an automatic fall-back library because there is no server side implementation for it on Caché.

Without any library built for Cache I am thinking what is needed is a simple native emulator of the client socket library that implements a simple long poll implementation with Cache. If I then hit a scalability problem then it would be time to put Node.JS in front of Cache with all the additional admin overhead. A nice problem to have all the same. Still, I suspect it would be a large number of users and sockets to swamp out Cache resources.

Your WebTerminal looks very good. Not sure why I have not seen that before, looks like something I would use. I'm not sure why we can't have a web based IDE for Cache when I see a complex working web application such as this. I even have my own web based IDE that I use for that other M database, not sure I can mentioned it here :) which I keep thinking to port to Cache.

Sean Connelly · Apr 20, 2017

Just glancing comments.

You are trying to set a parameter. I'm no ZEN expert, but I am pretty sure parameters are immutable in all classes.

The other thing, if I was doing this in CSP. Setting the content-type in the OnPage method would be too late, the headers would already be written. It would have to be written before then. Not sure if ZEN is similar, but I would override the OnPreHTTP (or equivalent) and set %response.ContentType=myCONTENTTYPE in that method.

Sean Connelly · Apr 20, 2017

I remember the encode / decode limitation was 3.6MB not 1MB, I corrected my original message.

Having built several EDT document solutions in Ensemble (sending thousands of documents a day) I have not had to code around this limitation.

But if you have documents that are bigger then take a look at GetFieldStreamBase64() on the orignal HL7 message. I've not used it, but should be fairly simple to figure out. In which case you can use an Ens.StreamContainer to move the message.

Thinking about it, there is an even simple solution, just send the HL7 message "as is" to the operation and do the extract and decode at the last second.

Sean Connelly · Apr 20, 2017

This is what I would do.

Create a custom process and extract the value using GetValueAt and put it into a string container. String containers are handy Ens.Request messages that you can use to move a strings around without needing to create a custom Ens.Request message. Then just send it async to an operation that will decode the base64 and write it to a file. Two lines of code, nice and simple...

Class My.DocExtractor Extends Ens.BusinessProcess [ ClassType = persistent ]
{

Method OnRequest(pRequest As Ens.Request, Output pResponse As Ens.Response) As %Status
{
Set msg=##class(Ens.StringContainer).%New(pRequest.GetValueAt("OBX(1):5.5"))
Quit ..SendRequestAsync("FILE OUT",msg,,"Send DOC as Base64 to a file writer")
}

}

To decode the base64 use this method inside your operation.

set decodedString=##class(%SYSTEM.Encryption).Base64Decode(pRequest.StringValue)

Things to consider...

1. You mention the message is 2.3, but the MSH has a 2.4
2. If you set your inbound service to use either of the default schemas for these two then you will have a problem with required EVN and PV1 segments
3. Therefore you will need to create a custom schema and make these optional.
4. The base 64 decode method is limited to a string, so your PDF documents can not be greater than 3.6MB (assuming large string support is on be default).
5. You probably don't want to decode the document into another message too soon, do this just before writing to a file

T02 to ITK should just be a matter of creating a new transform and dragging the OBX(1):5.5 field onto the reflective target field.

Sean Connelly · Apr 20, 2017

No problem. Eduard got us all on the right track with this.

Sean Connelly · Apr 20, 2017

Just adding a space will prevent the value being non escaped...

SELECT JSON_OBJECT('id': ' {{}')

Which you can add to the end as well...

SELECT JSON_OBJECT('id': '{{} ')

Which basically suggests that if a value starts with a { and ends with a } then the JSON_OBJECT function assumes that it is a valid JSON object and does not escape it, for instance this...

SELECT JSON_OBJECT('id': '{"hello":"world"}')

will output this...

{"id":{"hello":"world"}}

In some ways I would say this is valid / wanted behaviour, except perhaps that there should be a special Caché type for raw json where there is then an implicit handling of that type in these functions.

An alternative workaround that works is to use the CONCAT function to append a trailing space...

SELECT JSON_OBJECT('id':{fn CONCAT('{{}',' ')})

Which produces...

{"id":"{{} "}

Which on the original query would need to be...

SELECT JSON_OBJECT('idSQL':id, 'content': {fn CONCAT(content,' ')} ) FROM DocBook.block

Sean Connelly · Apr 20, 2017

What happens if you run the same query without the JSON_OBJECT function...

SELECT idSQL,content FROM DocBook.block

Sean Connelly · Apr 20, 2017

That error is caused by the extra curly brace, try...

SELECT JSON_OBJECT('id': '{}')

Sean Connelly · Apr 20, 2017

The error message is heavily escaped, it would look like this...

{"Info":{"Error":"ErrorCode":"5001","ErrorMessage":"ERROR #5001: Cannot find Subject Area: 'SampleCube'"} } }

This error is only raised in the %ParseStatement method of the %DeepSee.Query.Parser class.

I'm at the limits of what I know on DeepSee, but if I read this as it looks, there is a missing cube called SampleCube?

Sean Connelly · Apr 20, 2017

Check that you have not lost connection just before the Store() method...

ftp.Connected

Also check the value of

ftp.ReturnMessage

just after the Store() method, if there was a failure then this would have something useful to go on.

Sean.

Sean Connelly · Apr 20, 2017

Hi Everardo,

There is an extra couple of compilation steps required for the web method.

Each web method requires its own separate message descriptor class. This class contains the arguments of your method as properties of the class, e.g.

Property file As %Library.String(MAXLEN = "", XMLIO = "IN");
Property sql As %Library.String(MAXLEN = "", XMLIO = "IN");

This extra class is required to provide a concrete API to your web method. The web service description will project this class as a complex type that the calling services needs to adhere to.

What I think is happening is that when you have an argument called args... that the compiler is trying to compile

Property args... As %Library.String(MAXLEN = "", XMLIO = "IN");

Which would fail with an invalid member name error (which correlates with the 5130/5030 error code you have).

I think the main issue here is that there is nothing (to the best of my knowledge) in the SOAP specification that allows for variadic types.

Instead what you want is an argument type that can be projected as a list or an array, e.g.

ClassMethod GenerateFileFromSQL(file As %String, sql As %String, delimiter As %String = "", args As %ListOfDataTypes) As %String [ WebMethod ]

That will then be projected in the WSDL as a complex type with an unbounded max occurs, allowing the client to send any number of repeating XML elements for the property args.

If you pass args as %ListOfDataTypes to your non web method then you will need to decide if that method should have the same formal spec, or overload it, something like...

if $IsObject(args(1)),args(1).%IsA("%Library.ListOfDataTypes") {
set list=args(1)
for i=1:1:list.Count() {
write !,list.GetAt(i)
}
} else {
for i=1:1:args {
write !,args(i)
}
}

Sean.

Sean Connelly · Apr 19, 2017

Hi Scott,

The %Stream package superseded the stream classes in the %Library package. If you look at the class documentation you will see in the descriptions that the %Library stream classes have been deprecated in favour of the %Stream variants. The only reason they still exist would be for legacy implementations.

The other difference is that one is a character stream and the other is a binary stream. As a general rule you should only write text to the character stream and non text (e.g. images) to the binary stream. The main reason for this is to do with unicode characters. You may not have seen issues writing text to %FileBinaryStream, but that might well be because your text didn't have any unicode conversions going on.

Performance wise I'm not sure there would be much in it between the two. You can access the source code of both and they both use the same underlying raw code for reading and writing to files. If you benchmarked them then I guess you would see a marginal difference, but not enough to question which one to use for best performance.

I wonder, how did you determine that the logIt code was the reason for messages slowing down. On the surface it should only have a small impact on the message throughput. If messages are queueing up then it almost feels like this is just the first observation of an overall performance issue going on. I guess you have monitored overall IO performance. If it's already under strain then this could be the straw that breaks the camels back.

On a curious note, whilst you might have needed to log messages in eGate, I wonder why this would be necessary in Ensemble. Unless you are using in memory messaging, all of your messages will be automatically logged internally, as well as being tailed to the transaction logs. By adding your own logging you are effectively writing the same message to disk not twice but three times. If you also have IO logging enabled on your operation then it will be four times. Not to mention how many times the message was logged before the operation. On top of that, if you have log trace events enabled in production then the IO overhead for just one messages is going to thrash the disks more than it needs to. Multiply that across your production(s) and how well IO is (or is not) spread over disks and it would be easy to see how a peak flow of messages can start to queue.

Another reason I see for messages queuing (due to IO thrashing) is because of poor indexes elsewhere in the production. A data store that worked fast in development will now be so large that even simple lookups will hog the disks and flush out memory cache putting an exponential strain on everything else. Suddenly a simple bespoke logger feels like its writing at the speed of a ZX Spectrum to a tape recorder.

Of course you may well have a highly tuned system and production and all of this is a rambling spam from me. In which case, nine times out of ten if I see messages queuing its just because the downstream system can't process messages as quickly as Ensemble can send them.

Sean.

Sean Connelly · Apr 19, 2017

I just tried the above out and it works.

You can do it more succinctly, but you must use "set" on $property as a "do" will throw a compile error for some reason...

set sc=$property(parentObject,"childRefProperty").Insert(childObject)

Sean Connelly · Apr 19, 2017

Have you tried...

set p=$property(parentObject,"childRefProperty")
do p.Insert(childObject)

Sean Connelly · Apr 19, 2017

I was trying to figure out if you had found a secret zip command on windows, but realised from your code you are using...

http://gnuwin32.sourceforge.net/packages/zip.htm

7zip has always been rock solid for me on windows, and is well maintained. The above zip lib looks like its almost 10 years old now?

Perhaps use a combination of both as per the HS.Util.Zip.Adapter class.

Sean Connelly · Apr 19, 2017

Hi Greg,

The only zip utility that I have come across is in Healthshare (core 10+).

If you have Healthshare then take a look at...

HS.Util.Zip.Adapter

If you don't have Healthshare then it's still easy enough to do via the command line with $zf...

https://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RCOS_fzf-1

First, if you are on windows then there is no built in command line outside of powershell. You will need to install 7zip (btw, Healthshare defaults to 7zip on windows as well). If you are on Linux then there is a built in zip command, but you might also chose to install 7zip as well.

Couple of trip hazards.

If you are building the command line on windows then 7zip will be installed in "Program Files" with a space, so you will need to wrap quotes around the exe path, which will need double quoting in a cache string.

If you are unzipping to a directory, the directory needs to exist first. Take a look at CreateDirectoryChain on the %File class to make this easier to do.

A simple untested example...

ClassMethod ZipFile(pSourceFile As %String, pTargetFile As %String) As %Status
{
   set cmd="""C:\Program Files\7-Zip\7z.exe"" a "_pTargetFile _" "_pSourceFile
   set status=$zf(-1,cmd)
   if status=0 quit $$$OK
   quit $$$ERROR($$$GeneralError,"Failed to zip, reason code: "_status)
}

Anyone landing here and happy just to use gzip, then there was a recent discussion here...

https://community.intersystems.com/post/there-option-export-globals-archive

Hope that helps.

Sean.

Sean Connelly · Apr 18, 2017

Does this help...

http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RCOS_vjob