OpenId() is slow, why?

Question

Question

Arto Alatalo · Apr 29, 2020

#Object Data Model #ObjectScript #Performance #Caché

Hi Community! I need your advice.

The method below is 6 times slower on production server than on developing machine:

ClassMethod runme4()
{
    s cnt=615210
    s st=$zh
    for i=1:1:cnt {
        s p=##class(digi.packet).%OpenId("packet||5237")
        w:p="" "not found",!
        k p
    }
    w $zh-st,!
}

BUT the server is 6 times faster if OpenId replaced with simple read of a large global (s p=^someLargeGlobal). Any ideas what makes OpenId so slow only on the server?

And another question: I wanted to profile the code with %Monitor.System.LineByLine but it gives me statistics for my code only. Any chance to include the library code too?

Thanks!

Some details:

The server is almost idle at the experiment time.
The database is exactly same.
Cache version is same (2017.2) but the server is Linux and dev pc is Windows.
digi.packet is a regular %Persistent class with nothing special

Discussion (27)5

Log in or sign up to continue

Vic Sun · Apr 29, 2020

The other commenters have good suggestions but I would recommend additionally collecting a pButtons to compare. Try and determine if there are any differences other than OS - hardware, memory configuration, global mappings, or anything else. We want to make sure we're comparing apples to apples! If everything seems to be the same, then you could reach out to the WRC who can help investigate this further.

1 0

score 2 · Answer 1 · 2020-04-29T04:09:25-04:00

A rule of thumb:
Global buffer size ~50% of physical memory (set it manually !)
- enable / enforce large pages
- run it in a loop.
- the first OPEN has to fetch the whole global structure from disk, all following load from Global buffers

score 0 · Answer 2 · 2020-04-29T04:50:01-04:00

Arto Alatalo · Apr 29, 2020

Thanks. What do you mean with "run it in a loop"?

0 0

score 1 · Answer 3 · 2020-04-29T04:57:00-04:00

Sorry I saw your FOR loop only when the reply was out already
So you are not hurt by First-Touch effect anyhow.
As it is always the same object even buffer size should have no influence.
is there any other load on the server that could have influence ?

score 0 · Answer 4 · 2020-04-29T06:14:05-04:00

>is there any other load on the server that could have influence ?

The load is first explanation I had in mind. But I've done the comparison several times with server load near to none.

score 0 · Answer 5 · 2020-04-29T15:36:11-04:00

One more thought:
Depending on the call structure of your methos/classmethods it's necessary to increase routine buffers too.

In our case 128 MB was too less. We configure to 256 MB on all production systems and it runs flawless since.

score 1 · Answer 6 · 2020-04-29T15:51:30-04:00

Hi Alexander,

how did you realize that you have to adjust the buffers? What kind of problems you had?

score 0 · Answer 7 · 2020-04-30T05:12:57-04:00

We did some execution time measurements and had extremely high times in some nested calls compared between two machines. The calls were between some very big routines. The most extreme has its object code splitted to 46 ^rOBJ subnodes and has a total size of >1MB.

You realize the difference only in very long loops (as in the sample above) and deep nested calls to big routines (BTW: classes are compiled to routines too).

The measurements were made on a system with high load so the routine buffer was quickly invalidated by other processes.

I hope this helps.

score 1 · Answer 8 · 2020-04-30T08:34:00-04:00

You can check the mgstat's RouLas (routine loads and saves) - see this article that includes discussion of sizing routine buffers, among many other tips for sizing a system's memory config!

https://community.intersystems.com/post/intersystems-data-platforms-and-performance-part-4-looking-memory

score 2 · Answer 9 · 2020-04-29T06:07:04-04:00

Hello @Arto Alatalo

If there is no overload on production server, perhaps can you perform a D ^%BENCHLANG on production and dev machine?

It's benchmark CPU against ObjectScript language.

Compare the COSMark result.

score 0 · Answer 10 · 2020-04-29T06:52:47-04:00

Good idea! Never heard about BENCHLANG before. The server is almost idle at night time so now I know what it can do in its spare time.

score 0 · Answer 11 · 2020-04-29T14:48:18-04:00

For sure there are tons of differences in the hardware! And the price is primary of them! The productions server cost is I guess like ten of my developing laptops. And this is one of the reasons I expect the server be fast :)

Thanks. Yes, I'm now automatically collecting execution time every 15 mins to check the performance overnight when all the users are gone.

score 0 · Answer 12 · 2020-04-29T13:51:23-04:00

Any particular reason your id is not an integer?

I'd think it's a parent-child relationship but why a parent id is a string?

score 0 · Answer 13 · 2020-04-29T14:51:20-04:00

Hi Eduard,

no, it is not parent-child relationship, it's composite primary key, and I have good reasons for it.

score 0 · Answer 14 · 2020-04-29T15:59:09-04:00

it's composite primary key, and I have good reasons for it.

Which are?

Can you compare the performance to index opening:

s p=##class(digi.packet).IDKEYOpen("packet", 5237)

score 0 · Answer 15 · 2020-04-29T16:05:49-04:00

Yes, I can try it but how do you mean it is related to my problem? My problem is that my laptop executes very same code against very same database 6 times faster than powerful idle server.

score 0 · Answer 16 · 2020-04-29T16:21:26-04:00

Without reports suggested by other users there's not much I can tell you.

What are global buffer sizes on prod/dev machines?

Preheat global cache by looping over data global and compare the timings after that.

List of utilities/reports that should help you:

pButtons – all info you might need, open the ticket with WRC with this info.
mgstat –main metrics
PERFMON – get most used globals/routines
GLOSTAT – system metrics
%SYS.MONLBL – line by line code profiling
GLOBUFF – global buffer analysis

Check this series of articles by @Murray Oldfield.

score 0 · Answer 17 · 2020-05-01T13:55:16-04:00

This IDKEYOpen looks as capricious beast. Because of some reasons digi.packet has no such method. But its sibling class do have it although in lowercase:

zidkeyOpen(K1,K2,concurrency=-1,sc) public { Set:'($data(sc)#2) sc=1
Quit ..%OpenId(K1_"||"_K2,concurrency,.sc) }

A few other classed have IDKEYOpen in upper case:

zIDKEYOpen(K1,concurrency=-1,sc) public { Set:'($data(sc)#2) sc=1
Quit ..%OpenId(.K1,concurrency,.sc) }

Looks as the compiler generates this method only in some cases. But why in lower case? Strange.

What for the performance, the method looks as a very simple wrapper around %OpenId, so no much performance difference to expect.

score 0 · Answer 18 · 2020-05-27T08:55:18-04:00

The name of the <index>Open method is based on the name of the index. I suspect that in one case the name of the IDKEY index is "IDKEY" and in another it is "idkey". No <index>Open method is generated if the index is not unique, primary or id.

score 0 · Answer 19 · 2020-06-05T02:16:31-04:00

Arto Alatalo · Jun 5, 2020

Oh yes, indeed. Thanks Daniel.

0 0

score 1 · Answer 20 · 2020-04-30T13:45:33-04:00

You are wondering about a SLOW production server and a FAST development machine?

I show you, how to turn your machines (production or development or both) into a SLOW and FAST systems by using your own measurement method.

ClassMethod Unclean()
{
   s p=##class(digi.packet).%OpenId("packet||5237")
   d ..runme4()
} 

ClassMethod Clean()
{
   d ..runme4()
}

Login to either of your systems and try the abowe methods in a terminal session. On my local system (of course, with an own persistent class) the one method is 12 times faster then the other.

Maybe you have an "Unclean" situation on your development system...

Also, sometime it is worth to put a line like

do $system.OBJ.ShowObjects()

as a first line into the runme4() method

score 0 · Answer 21 · 2020-04-30T13:53:02-04:00

No, this is not my case :) I've exact copy of the test method on each machine.

I can see some difference between the methods: with p existing, each next Open spends some time checking if the object already open. But 12 times... I wonder what makes it.

Thanks for your time.

score 1 · Answer 22 · 2020-04-30T14:05:54-04:00

The %OpenId() method checks, if the object to be opened ALREADY open (in the current session/job). If it's open then it just returns with an success. In other case, it loads it from the disk.

If on your development system, for whatever reason, there is an open instance then you save 6.1 million times the "load" operation.

score 2 · Answer 23 · 2020-05-01T10:50:53-04:00

You wrote

BUT the server is 6 times faster if OpenId replaced with simple read of a large global (s p=^someLargeGlobal). Any ideas what makes OpenId so slow only on the server?

while not specifically answering your general question, note that opening an object is very different than

Set p=^SomeLargeGlobal.

When you call %OpenId

Concurrency is checked. While not widely used there is a second parameter to the %OpenId method that controls the behavior of concurrency and you may want to consider providing the desired concurrency option. In the past if I was not concerned with concurrency as I just wanted to read data I would use option 0.
Each and every property defined in your class is loaded into memory, ie the data is read from the global and then the individual properties are set accordingly. Depending on how complex your class is the difference between

Set p=^SomeLargeGlobal

and %OpenId can be quite different, at least academically.

If your class as relationships and/or serial/embedded classes I do not believe those properties are fully loaded into memory, but I could be incorrect.

In practice, if I need an object I use objects, if I need only a small part of the object/row I'll use embedded SQL for better performance.

Again this does not specifically answer your general question but I think it is useful to understand what %OpenId does and why it's not the same as

Set p=^SomeLargeGlobal.

score 0 · Answer 24 · 2020-05-01T11:27:16-04:00

Yes, for sure pure global reading is not the same as Open. With this test I just wanted to make sure the slowness is not related to global access speed.

score 0 · Answer 25 · 2020-05-01T11:51:59-04:00

You wrote:

if I was not concerned with concurrency as I just wanted to read data I would use option 0.

Just to see how much gain I can have, I compared options 0 and 1, and the diff is surprisingly small: the test with 0 takes 34 secs, and with 1 takes 35. I would expect a bigger difference.

score 0 · Answer 26 · 2020-05-27T09:08:42-04:00

The difference between runs can be caused simply by normal variations in system load. The difference between Concurrency = 0 and Concurrency = 1 is only present when the object is stored in multiple global nodes. There are five possible values that are described in the %Library.Persistent class documentation (viewable through the SMP). Concurrency = 1 is referred to as "atomic read". For single node objects there is no extra work required for atomic read.