Question David.Satorres6134 · Aug 7, 2018

Speeding up $listget

Hi,

I'm trying to find the faster way to get the data from a class, and I find it very slow compared to traditional globals. So, I hope some of you can bring some light to me :-)

I have thousands of registers in a class, and to access it quickly I'm going with $o at the index. From there, I get the values using $listget(). Something like that:


    s FromDateH = (+$h-1)
    for {
        set id=$order(^TestI("StartDateIDX",FromDateH,id))
        quit:id=""
        set dat=$lg(^TestD(id))    //dat=$lb("a","b","c","d","e")
    }
    


I find it quite slow, so I've tried the same code but changing the conents of ^TEST from a List to a String:


    s FromDateH = (+$h-1)
    for {
        set id=$order(^TESTINDEX("StartDateIDX",FromDateH,id))
        quit:id=""
        set dat=$g(^TEST2(id))   //dat = "a#b#c#d#e"
    }

I find that getting the data from a String (like "a#b#c#d#e")  is 4 to 5 times faster than getting it from a List (like $lb("a","b","c","d","e")). When you are managing a few records it may not make a differenc but in my case its having a huge impact.

Does anyone know how can I speed up the $listget() so the performance is similar as the old fashioned MUMPS way to get the string separated values?

Cheers! :-)

Comments

David Van De Griek · Aug 7, 2018

Once you have the $list value in dat, use $listnext rather than a loop with $listget. 

For example:

     set ptr = 0 while $listnext(dat,ptr,value) { <do something with your value in value> }

0
David.Satorres6134  Aug 8, 2018 to David Van De Griek

Thanks David!

But the problem is that that what takes looong time is getting the list values, not throttling over them.

0
David.Satorres6134  Aug 8, 2018 to Julius Kavay

Yes, sorry, my mistake. Actually the line should be:

        set dat=$g(^TestD(id))    //dat=$lb("a","b","c","d","e")
 

compared to: 

        set dat=$g(^TEST2(id))   //dat = "a#b#c#d#e"
 

Thanks to Julius suggestion, I've ran the %SYS.MONLBL analysis tool and clearly something is messing up when trying to get the data from a list:

Routine Line GloRef DataBlkRd UpntBlkBuf BpntBlkBuf DataBlkBuf RtnLine Time TotalTime Code
Test.1 78 16823 9128 14129 14129 7742 16823 66.282935 66.282935   set dat2=$get(^ListGlobal(id))
Test.1 79 16823 0 1849 1849 16904 16823 0.062076 0.062076   set dat=$get(^StringGlobal(id))
0
Eduard Lebedyuk · Aug 7, 2018

If you're sure that your id bigger ids are generated later, you can only get the first id from index and after that iterate the data global directly:

set FromDateH = (+$h-1)
set id = ^TestI("StartDateIDX",FromDateH,id)
for {
    set id=$order(^TestD(id),1,dat)
    quit:id=""
    //dat=$lb("a","b","c","d","e")
}

Also you can use three-argument form of $order to iterate and get data in one command.

Finally, consider checking work-heavy system with %SYS.MONLBL to verify what lines consume more  time.

0
Eduard Lebedyuk  Aug 8, 2018 to David.Satorres6134

$get works as fast as global could be read. Some ideas:

  1. You can use ^PERFMON to see how these two globals are read - from disk or from memory. To do that collect 2 reports - with only ListGlobal load and only StringGlobal load and compare Physical reads.
  2. Are both globals the same size?
  3. How's globals size relative to global buffer?
0
Jon Willeke  Aug 9, 2018 to David.Satorres6134

I'm not sure I'm reading this correctly, but I believe the key difference in the cold runs is 10,399 vs 5,853, again suggesting that ^ListData went to disk more often.

I'm surprised that it makes such a big difference, but I suspect what's happening here is that your copy of ^ListData into ^StringData resulted in a more space-efficient organization. You might want to look at the packing column of a detailed report from the %GSIZE utility.

It's possible that something about your data causes $list to store it less efficiently, but your data hasn't convinced me of that. If you copied ^ListData unchanged into, say, ^ListData2, my guess is that you would see a similar improvement.

0
Mark Hanson  Aug 8, 2018 to David.Satorres6134

You are doing disk block reads in the one case which is why it is slower, how big is your global buffer pool? Also how big are your globals ^TestD and ^TEST2, use 'd ^%GSIZE' to find their sizes on disk. The $lb version will be slightly bigger as there is a byte taken as a type byte for each element and another length byte, this shows up when the data is very small like these single character ASCII elements, but $lb does mean you never need to worry about data that contains '#' characters and it preserves types where as the "a#b#..." format needs to convert everything into a string before storing it which adds runtime overhead too.

-- 

Mark

0
Jon Willeke  Aug 8, 2018 to David.Satorres6134

Compared to a delimited string, lists have the overhead of storing the length of each element, typically one extra byte. Numbers and Unicode characters are also stored differently, sometimes more efficiently, sometimes less. Otherwise, there is no difference between fetching a delimited string or a list.

The DataBlkRd and DataBlkBuf columns shows that ^StringGlobal was read entirely from global buffers, whereas ^ListGlobal had to read over 9,000 blocks. In each case, it seems that the global occupies about 17,000 blocks; about 136 MB, assuming 8 KB blocks.

I suggest that you do the following:

  • configure 256 MB or more of global buffers
  • restart the instance
  • run one of the tests twice
  • restart the instance
  • run the other test twice

Based on your numbers, the first runs will be cold, and should take a minute or two. The second runs should be essentially instantaneous.

0
David.Satorres6134  Aug 9, 2018 to David.Satorres6134

Finally I've made some tests. I have duplicated the listglobal changint its values to string, so I can compare two different globals wth the same data but stored differently. Results, show that accessing a list is much slower.

Routine Line GloRef upper pointer block reads bottom pointer block reads data block reads directory block requests satisfied from a global upper pointer block requests satisfied from a global buffer bottom pointer block requests satisfied from a global buffer data block requests satisfied from a global buffer M commands Time TotalTime Code
Test.1 78 171053 1 40 10399 1 14109 14070 165867 171053 43.32538 43.32538 set dat=$g(^ListData(id))
Test.1 78 171053 14110 14110 176266 0 0 0 0 171053 0.265694 0.265694 set dat=$g(^ListData(id))
                           
                           
Test.1 79 171053 1 23 5853 1 11607 11585 166642 171053 20.5958 20.5958 set dat=$g(^StringData(id))
Test.1 79 171053 11608 11608 172495 0 0 0 0 171053 0.237311 0.237311 set dat=$g(^StringData(id))
                           
                           

But finally, after reading a bit of doc I found that I could improve the performance by changing the database from 8kb to 64kb. And it really worked:

Routine Line GloRef upper pointer block reads bottom pointer block reads data block reads directory block requests satisfied from a global upper pointer block requests satisfied from a global buffer bottom pointer block requests satisfied from a global buffer data block requests satisfied from a global buffer M commands Time TotalTime Code
                           
Test.1 78 171053 0 1 1861 1 0 6642 169402 171053 7.234114 7.234114 set dat=$g(^ListData(id))
Test.1 78 171053           6643 171263 171053 0.225354 0.225354 set dat=$g(^ListData(id))
                           
Test.1 79 171053     1808 1   6534 169420 171053 2.12363 2.12363 set dat=$g(^StringData(id))

So

0
Julius Kavay · Aug 7, 2018

You are comparing apples wih oranges!

The line  (your case 1):

set dat=$lg(^TestD(id))    //dat=$lb("a","b","c","d","e")

sets dat to the FIRST list item (if present, or to "", if the list item is NOT present) of ^TestD(id)  but ONLY if ^TestD(id) is defined AND it is a Cache list.

In elsecase, you will get an <UNDEF> if ^TestD(id) does not exists or an <LIST> error, if ^TestD(id) exists but the content is not a list!

The line (your case 2):

 set dat=$g(^TEST2(id))   //dat = "a#b#c#d#e"

sets dat to the content of ^TEST2(id) , if  it exists or to "", if there is no ^TEST2(id)

0