Speeding up $listget

Hi,

I'm trying to find the faster way to get the data from a class, and I find it very slow compared to traditional globals. So, I hope some of you can bring some light to me :-)

I have thousands of registers in a class, and to access it quickly I'm going with $o at the index. From there, I get the values using $listget(). Something like that:


    s FromDateH = (+$h-1)
    for {
        set id=$order(^TestI("StartDateIDX",FromDateH,id))
        quit:id=""
        set dat=$lg(^TestD(id))    //dat=$lb("a","b","c","d","e")
    }
    


I find it quite slow, so I've tried the same code but changing the conents of ^TEST from a List to a String:


    s FromDateH = (+$h-1)
    for {
        set id=$order(^TESTINDEX("StartDateIDX",FromDateH,id))
        quit:id=""
        set dat=$g(^TEST2(id))   //dat = "a#b#c#d#e"
    }

I find that getting the data from a String (like "a#b#c#d#e")  is 4 to 5 times faster than getting it from a List (like $lb("a","b","c","d","e")). When you are managing a few records it may not make a differenc but in my case its having a huge impact.

Does anyone know how can I speed up the $listget() so the performance is similar as the old fashioned MUMPS way to get the string separated values?

Cheers! :-)

  • 0
  • 0
  • 308
  • 7
  • 3

Answers

Once you have the $list value in dat, use $listnext rather than a loop with $listget. 

For example:

     set ptr = 0 while $listnext(dat,ptr,value) { <do something with your value in value> }

Thanks David!

But the problem is that that what takes looong time is getting the list values, not throttling over them.

If you're sure that your id bigger ids are generated later, you can only get the first id from index and after that iterate the data global directly:

set FromDateH = (+$h-1)
set id = ^TestI("StartDateIDX",FromDateH,id)
for {
    set id=$order(^TestD(id),1,dat)
    quit:id=""
    //dat=$lb("a","b","c","d","e")
}

Also you can use three-argument form of $order to iterate and get data in one command.

Finally, consider checking work-heavy system with %SYS.MONLBL to verify what lines consume more  time.

You are comparing apples wih oranges!

The line  (your case 1):

set dat=$lg(^TestD(id))    //dat=$lb("a","b","c","d","e")

sets dat to the FIRST list item (if present, or to "", if the list item is NOT present) of ^TestD(id)  but ONLY if ^TestD(id) is defined AND it is a Cache list.

In elsecase, you will get an <UNDEF> if ^TestD(id) does not exists or an <LIST> error, if ^TestD(id) exists but the content is not a list!

The line (your case 2):

 set dat=$g(^TEST2(id))   //dat = "a#b#c#d#e"

sets dat to the content of ^TEST2(id) , if  it exists or to "", if there is no ^TEST2(id)

Yes, sorry, my mistake. Actually the line should be:

        set dat=$g(^TestD(id))    //dat=$lb("a","b","c","d","e")
 

compared to: 

        set dat=$g(^TEST2(id))   //dat = "a#b#c#d#e"
 

Thanks to Julius suggestion, I've ran the %SYS.MONLBL analysis tool and clearly something is messing up when trying to get the data from a list:

RoutineLineGloRefDataBlkRdUpntBlkBufBpntBlkBufDataBlkBufRtnLineTimeTotalTimeCode
Test.178168239128141291412977421682366.28293566.282935  set dat2=$get(^ListGlobal(id))
Test.1791682301849184916904168230.0620760.062076  set dat=$get(^StringGlobal(id))

$get works as fast as global could be read. Some ideas:

  1. You can use ^PERFMON to see how these two globals are read - from disk or from memory. To do that collect 2 reports - with only ListGlobal load and only StringGlobal load and compare Physical reads.
  2. Are both globals the same size?
  3. How's globals size relative to global buffer?

You are doing disk block reads in the one case which is why it is slower, how big is your global buffer pool? Also how big are your globals ^TestD and ^TEST2, use 'd ^%GSIZE' to find their sizes on disk. The $lb version will be slightly bigger as there is a byte taken as a type byte for each element and another length byte, this shows up when the data is very small like these single character ASCII elements, but $lb does mean you never need to worry about data that contains '#' characters and it preserves types where as the "a#b#..." format needs to convert everything into a string before storing it which adds runtime overhead too.

-- 

Mark

Compared to a delimited string, lists have the overhead of storing the length of each element, typically one extra byte. Numbers and Unicode characters are also stored differently, sometimes more efficiently, sometimes less. Otherwise, there is no difference between fetching a delimited string or a list.

The DataBlkRd and DataBlkBuf columns shows that ^StringGlobal was read entirely from global buffers, whereas ^ListGlobal had to read over 9,000 blocks. In each case, it seems that the global occupies about 17,000 blocks; about 136 MB, assuming 8 KB blocks.

I suggest that you do the following:

  • configure 256 MB or more of global buffers
  • restart the instance
  • run one of the tests twice
  • restart the instance
  • run the other test twice

Based on your numbers, the first runs will be cold, and should take a minute or two. The second runs should be essentially instantaneous.

Finally I've made some tests. I have duplicated the listglobal changint its values to string, so I can compare two different globals wth the same data but stored differently. Results, show that accessing a list is much slower.

RoutineLineGloRefupper pointer block readsbottom pointer block readsdata block readsdirectory block requests satisfied from a globalupper pointer block requests satisfied from a global bufferbottom pointer block requests satisfied from a global bufferdata block requests satisfied from a global bufferM commandsTimeTotalTimeCode
Test.178171053140103991141091407016586717105343.3253843.32538set dat=$g(^ListData(id))
Test.178171053141101411017626600001710530.2656940.265694set dat=$g(^ListData(id))
              
              
Test.17917105312358531116071158516664217105320.595820.5958set dat=$g(^StringData(id))
Test.179171053116081160817249500001710530.2373110.237311set dat=$g(^StringData(id))
              
              

But finally, after reading a bit of doc I found that I could improve the performance by changing the database from 8kb to 64kb. And it really worked:

RoutineLineGloRefupper pointer block readsbottom pointer block readsdata block readsdirectory block requests satisfied from a globalupper pointer block requests satisfied from a global bufferbottom pointer block requests satisfied from a global bufferdata block requests satisfied from a global bufferM commandsTimeTotalTimeCode
              
Test.1781710530118611066421694021710537.2341147.234114set dat=$g(^ListData(id))
Test.178171053     66431712631710530.2253540.225354set dat=$g(^ListData(id))
              
Test.179171053  18081 65341694201710532.123632.12363set dat=$g(^StringData(id))

So

I'm not sure I'm reading this correctly, but I believe the key difference in the cold runs is 10,399 vs 5,853, again suggesting that ^ListData went to disk more often.

I'm surprised that it makes such a big difference, but I suspect what's happening here is that your copy of ^ListData into ^StringData resulted in a more space-efficient organization. You might want to look at the packing column of a detailed report from the %GSIZE utility.

It's possible that something about your data causes $list to store it less efficiently, but your data hasn't convinced me of that. If you copied ^ListData unchanged into, say, ^ListData2, my guess is that you would see a similar improvement.