Speeding up $listget

Question

Question

David.Satorres6134 · Aug 7, 2018

Hi,

I'm trying to find the faster way to get the data from a class, and I find it very slow compared to traditional globals. So, I hope some of you can bring some light to me :-)

I have thousands of registers in a class, and to access it quickly I'm going with $o at the index. From there, I get the values using $listget(). Something like that:

   s FromDateH = (+$h-1)
   for {
       set id=$order(^TestI("StartDateIDX",FromDateH,id))
       quit:id=""
       set dat=$lg(^TestD(id)) //dat=$lb("a","b","c","d","e")
   }

I find it quite slow, so I've tried the same code but changing the conents of ^TEST from a List to a String:

   s FromDateH = (+$h-1)
   for {
       set id=$order(^TESTINDEX("StartDateIDX",FromDateH,id))
       quit:id=""
       set dat=$g(^TEST2(id)) //dat = "a#b#c#d#e"
   }

I find that getting the data from a String (like "a#b#c#d#e") is 4 to 5 times faster than getting it from a List (like $lb("a","b","c","d","e")). When you are managing a few records it may not make a differenc but in my case its having a huge impact.

Does anyone know how can I speed up the $listget() so the performance is similar as the old fashioned MUMPS way to get the string separated values?

Cheers! :-)

Discussion (10)2

Log in or sign up to continue

David Van De Griek · Aug 7, 2018

Once you have the $list value in dat, use $listnext rather than a loop with $listget.

For example:

set ptr = 0 while $listnext(dat,ptr,value) { <do something with your value in value> }

0 0

Eduard Lebedyuk · Aug 7, 2018

If you're sure that your id bigger ids are generated later, you can only get the first id from index and after that iterate the data global directly:

set FromDateH = (+$h-1)
set id = ^TestI("StartDateIDX",FromDateH,id)
for {
    set id=$order(^TestD(id),1,dat)
    quit:id=""
    //dat=$lb("a","b","c","d","e")
}

Also you can use three-argument form of $order to iterate and get data in one command.

Finally, consider checking work-heavy system with %SYS.MONLBL to verify what lines consume more time.

0 0

Jon Willeke · Aug 8, 2018

Compared to a delimited string, lists have the overhead of storing the length of each element, typically one extra byte. Numbers and Unicode characters are also stored differently, sometimes more efficiently, sometimes less. Otherwise, there is no difference between fetching a delimited string or a list.

The DataBlkRd and DataBlkBuf columns shows that ^StringGlobal was read entirely from global buffers, whereas ^ListGlobal had to read over 9,000 blocks. In each case, it seems that the global occupies about 17,000 blocks; about 136 MB, assuming 8 KB blocks.

I suggest that you do the following:

configure 256 MB or more of global buffers
restart the instance
run one of the tests twice
restart the instance
run the other test twice

Based on your numbers, the first runs will be cold, and should take a minute or two. The second runs should be essentially instantaneous.

2 0

Julius Kavay · Aug 7, 2018

You are comparing apples wih oranges!

The line (your case 1):

set dat=$lg(^TestD(id)) //dat=$lb("a","b","c","d","e")

sets dat to the FIRST list item (if present, or to "", if the list item is NOT present) of ^TestD(id) but ONLY if ^TestD(id) is defined AND it is a Cache list.

In elsecase, you will get an <UNDEF> if ^TestD(id) does not exists or an <LIST> error, if ^TestD(id) exists but the content is not a list!

The line (your case 2):

set dat=$g(^TEST2(id)) //dat = "a#b#c#d#e"

sets dat to the content of ^TEST2(id) , if it exists or to "", if there is no ^TEST2(id)

1 0

score 0 · Answer 1 · 2018-08-08T04:26:05-04:00

Thanks David!

But the problem is that that what takes looong time is getting the list values, not throttling over them.

score 0 · Answer 2 · 2018-08-08T04:30:04-04:00

Yes, sorry, my mistake. Actually the line should be:

set dat=$g(^TestD(id)) //dat=$lb("a","b","c","d","e")

compared to:

set dat=$g(^TEST2(id)) //dat = "a#b#c#d#e"

Thanks to Julius suggestion, I've ran the %SYS.MONLBL analysis tool and clearly something is messing up when trying to get the data from a list:

Routine	Line	GloRef	DataBlkRd	UpntBlkBuf	BpntBlkBuf	DataBlkBuf	RtnLine	Time	TotalTime	Code
Test.1	78	16823	9128	14129	14129	7742	16823	66.282935	66.282935	set dat2=$get(^ListGlobal(id))
Test.1	79	16823	0	1849	1849	16904	16823	0.062076	0.062076	set dat=$get(^StringGlobal(id))

score 0 · Answer 3 · 2018-08-08T05:36:57-04:00

$get works as fast as global could be read. Some ideas:

You can use ^PERFMON to see how these two globals are read - from disk or from memory. To do that collect 2 reports - with only ListGlobal load and only StringGlobal load and compare Physical reads.
Are both globals the same size?
How's globals size relative to global buffer?

score 0 · Answer 4 · 2018-08-09T11:58:45-04:00

I'm not sure I'm reading this correctly, but I believe the key difference in the cold runs is 10,399 vs 5,853, again suggesting that ^ListData went to disk more often.

I'm surprised that it makes such a big difference, but I suspect what's happening here is that your copy of ^ListData into ^StringData resulted in a more space-efficient organization. You might want to look at the packing column of a detailed report from the %GSIZE utility.

It's possible that something about your data causes $list to store it less efficiently, but your data hasn't convinced me of that. If you copied ^ListData unchanged into, say, ^ListData2, my guess is that you would see a similar improvement.

score 0 · Answer 5 · 2018-08-08T10:36:06-04:00

You are doing disk block reads in the one case which is why it is slower, how big is your global buffer pool? Also how big are your globals ^TestD and ^TEST2, use 'd ^%GSIZE' to find their sizes on disk. The $lb version will be slightly bigger as there is a byte taken as a type byte for each element and another length byte, this shows up when the data is very small like these single character ASCII elements, but $lb does mean you never need to worry about data that contains '#' characters and it preserves types where as the "a#b#..." format needs to convert everything into a string before storing it which adds runtime overhead too.

--

Mark

score 0 · Answer 6 · 2018-08-09T09:44:00-04:00

Finally I've made some tests. I have duplicated the listglobal changint its values to string, so I can compare two different globals wth the same data but stored differently. Results, show that accessing a list is much slower.

Routine	Line	GloRef	upper pointer block reads	bottom pointer block reads	data block reads	directory block requests satisfied from a global	upper pointer block requests satisfied from a global buffer	bottom pointer block requests satisfied from a global buffer	data block requests satisfied from a global buffer	M commands	Time	TotalTime	Code
Test.1	78	171053	1	40	10399	1	14109	14070	165867	171053	43.32538	43.32538	set dat=$g(^ListData(id))
Test.1	78	171053	14110	14110	176266	0	0	0	0	171053	0.265694	0.265694	set dat=$g(^ListData(id))


Test.1	79	171053	1	23	5853	1	11607	11585	166642	171053	20.5958	20.5958	set dat=$g(^StringData(id))
Test.1	79	171053	11608	11608	172495	0	0	0	0	171053	0.237311	0.237311	set dat=$g(^StringData(id))

But finally, after reading a bit of doc I found that I could improve the performance by changing the database from 8kb to 64kb. And it really worked:

Routine	Line	GloRef	upper pointer block reads	bottom pointer block reads	data block reads	directory block requests satisfied from a global	upper pointer block requests satisfied from a global buffer	bottom pointer block requests satisfied from a global buffer	data block requests satisfied from a global buffer	M commands	Time	TotalTime	Code

Test.1	78	171053	0	1	1861	1	0	6642	169402	171053	7.234114	7.234114	set dat=$g(^ListData(id))
Test.1	78	171053						6643	171263	171053	0.225354	0.225354	set dat=$g(^ListData(id))

Test.1	79	171053			1808	1		6534	169420	171053	2.12363	2.12363	set dat=$g(^StringData(id))

So