Question
· Jul 26, 2023

Any other options to Iterating entries in a global faster

Hi,

We have a global with 65 million entries. All we are doing is just iterating through each entry to find out the total no of entries. It is currently taking 110 minutes. Is this something normal? What can we do to speed up this process? Below is the program for just iterating each entry in the global.

s sub=" ,count=0
f{              
     s sub=$o(^YYY(sub)) q:sub=""
     s count= count +1
     }
     w!, "Total Count:"_ count
 

$ZV: Cache for Windows (x86-64) 2018.1.8 (Build 766) Wed Mar 8 2023 22:01:03 EST
Discussion (19)2
Log in or sign up to continue

110 minutes, it seems impossible, or, it's something way too wrong

USER>set ts = $zh f i=1:1:65000000 { set ^YYY(i)="somedata" } write !,"elapsed: ", $zh-ts 

elapsed: 11.126631

USER>s sub="", count = 0, ts = $zh for { set sub = $Order(^YYY(sub)) quit:sub=""  set count = count + 1 } write !,"elapsed: ", $zh-ts 

elapsed: 9.549079

Here result in seconds

Yes, for sure, my example is too simple, and too far from any real situation. 

And there are multiple issues that may happen with your data, it can be how it is stored, where it is stored, how much data in values. And it's difficult to suggest how to check it

Have a look at my series of articles about globals in the database, just for information, it may help understand something, what may go wrong

In any case, there is a right way to count objects, without counting all of them this way. Is using bitmap index, which you can use even if you have own storage, and do not use objects yet. You still able to build own bitmap index, and count items by this index will be at least 64000 times faster, whereas 64000 is just chunk size for bitmap, and speed will vary depends if you don't have much empty spaces between id's, which needs to be numeric

Thank you @Dmitry Maslennikov 

I have tried the same program and am getting an elapsed 15.1234 seconds to read.

But my real global has more than 6-8 subscripts and takes 110 minutes.

I'm creating a similar global with 6-8 subscripts and works fine and gets a faster response and takes only 15 to 16 seconds. But even though we traverse only one level in the $order program. So there are no issues in that subscript level.

Is there any other issue in my global? 
 

Hi @Alexey Maslov 

Thank you.

write $$AllocatedSize^%GSIZE($name(^realglobal))/1024/1024
46866.7890625
write $$AllocatedSize^%GSIZE($name(^test global))/1024/1024
4154.2734375

Real Global has a count of 30000000

Test global has a count of 45000000

Both global traverse only 1st level in $order.

Real global takes 90 minutes.

 Test Global takes 15 sec

It traverses only the first node even though it takes more time.

Now it is obvious that ^realglobal has 11.3 times the size of ^testglobal
while your ^testglobal seems to have all blocks in buffers in memory
this is rather unlikely for ^realglobal.
So repeated reloading of blocks from disk seems to cause the delay
If you have the chance to double or triple your global buffers
you most likely may see improved processing of your $O()

This is an important point. Depending on the size of each node, this $order loop could be touching virtually every block in the global. If you read the 4GB test global after setting it, you're reading from a warm buffer pool, whereas the 40GB production global is less likely to be buffered—hence the greater than 10x difference in time.

I don't have a good suggestion for how to make this loop run faster. $prefetchon might help a little. Rather, depending on why you need to perform this operation, I'd either cache the record count (which could then become a hot spot of its own), or maintain an index global (possibly a bitmap).

I ran your code against a production snapshot and got:
elapsed: 7365.54051
w count
57434071

and

elapsed: 10925.91166
w count
43841315

As you pointed this is a problem in approach.

This is on enterprise hardware so that is not the issue.

I did change your code to use a second subscript ^XXX("YYY",sub)

as an example of a better approach
w $Order(^XXX("YYY",""),-1)
57435810

If your enterprise still uses mechanical disks, that's still a factor that has to be considered

All the blocks required to be read to get through the list can be placed all around the disk/s. And it takes time. On a live system, with many changes, and when a lot of different data is stored, the next block can be far from the previous one. So, on mechanical discs, defragmentation is matter, and may slow the speed.

I don't know the character of your data, but if you have a lot of data, stored in the globals, it will require to read much more blocks, to even just count the items.

And most probably the easiest way to solve it, is just to use bitmap index.

While my global is quite simple, and it's size is about 1.1 GB, or around 18 bytes per record

USER>do ##class(%GlobalEdit).GetGlobalSize("/usr/irissys/mgr/user","YYY",.all,.used)

USER>zw all
all=1109

USER>w all*1024*1024/65000000
17.89031975384615385

After restarting, with a cold cache, I got 17 seconds

USER>s sub="", count = 0, ts = $zh for { set sub = $Order(^YYY(sub)) quit:sub=""  set count = count + 1 } write !,"elapsed: ", $zh-ts 

elapsed: 16.994676

So, my disk can read around 65MB per sec

Hi Rathinakumar,

One reason may be process contention for same block.

Most application processes work by $Ordering forward from the top of the global down.

When doing table scans, this can results in processes waiting, or running behind another process.

As an alternative for Support and Large Reporting Jobs you can instead "$Order UP" instead.

s sub=$o(^YYY(sub),-1) q:sub=""

Interested if this may mitigate any performance issue caused by contention on a busy system.

In the mean time the answer is already given by Mr Maslennikov and others, but to shorten your runtime,  you could also try a "variablenless" loop, maybe this gives you some seconds more... The emphasis is on "maybe".

// if you have null-subscripts enabled
if $d(^YYY(""))!1 for i=0:1 {quit:$order(@$zr)=""} write i

// if null-subscripts are not enabled
if $d(^YYY(-9E18))!1 for i=0:1 {quit:$order(@$zr)=""} write i

// of course, you can do that on any arbitrary level too
if $d(^YYY(1,2,3,""))!1 for i=0:1 {...} write i

// the value of -9E19 ist just for explanation, use a value suitable for your needs

May I ask for the background, why do you need that node count? 

Hi,

question is here how is that global being populated/created? Is this global defined by usage of a class inheriting from %Persistent?

If so, a fast way would be to be using %Extent to enumerate all instances and then look at the rowcount.

e.g.

set query = ##class(%SQL.Statement).%New()
set qStatus = query.%PrepareClassQuery("User.Person","Extent")
set rset=query.%Execute()
d rset.%Display()
zw %ROWCOUNT