Traverse Global Subscripts while using Indirection

Primary tabs

Hi all, 

I am trying to create a method to count the number of entries in a global, including all subscripts. I am having a bit of trouble getting the code to make it to the second subscript. When I get to the position where my key is "Canada" and I add a comma and empty quotes to it, it returns USA as the new key when I do the order function. Is the $Order or the global not able to use a single string to represent multiple subscripts?

 

Here is my global structure:

^Locations("Canada",1)="Montreal"
^Locations("Canada",2)="Vancouver"
^Locations("USA",1)="Michigan"
^Locations("USA",2)="Ohio"
^Locations("USA",3)="Florida"

 

Here is my method:

ClassMethod RecursiveGlobalCount(pGlobalName As %String, pKey As %String, pCount As %Integer) As %Integer
{
///if pCount is not populated set to zero for first run
if $Data(pCount)=0
{
set tCount = 0
}
else
{
set tCount = pCount
}

//pKey should only be undefined on first run
if $Data(pKey)=0
{
set tKey = $Order(@pGlobalName@(""))
w $Data(@pGlobalName@(tKey))
while tKey'=""
{
//check to see if global has descendents
if $Data(@pGlobalName@(tKey))=10
{
do ..RecursiveGlobalCount(pGlobalName,tKey,tCount)

}
set tKey = $Order(@pGlobalName@(tKey))
set tCount=tCount+1
}

}
else
{
set tKey = pKey


if ($Data(@pGlobalName@(tKey))=1)
{
set tKey=""
set tKey = $Order(@pGlobalName@(""))
while tKey'=""
{
set tKey = $Order(@pGlobalName@(tKey))
set tCount=tCount+1
}
}
elseif ($Data(@pGlobalName@(tKey))=10)
{
set tKey=pKey_","""""
set tKey = $Order(@pGlobalName@(tKey))
while tKey'=""
{
set tKey = $Order(@pGlobalName@(tKey))
set tCount=tCount+1
}
}
}
}
  • + 2
  • 0
  • 247
  • 14
  • 6

Answers

Class Test.Test1 Extends (%RegisteredObject, %XML.Adaptor)
{

ClassMethod RecursiveGlobalCount(pGlobalName As %String, pKey As %String, pCount As %Integer) As %Integer
  {
      /// pCount is zero if not provided
      Set pCount = +$Get(pCount)
      Set tKey = $Get(pKey)

      //pKey should only be undefined on first run
      Set tKey = $Query(@pGlobalName@(tKey))
      If (tKey '= "") Set pCount = 1 + pCount // setting tKey got the first node

      For {
          Set tKey = $Query(@tKey)
          If ((tKey = "") || (tKey '[ pKey)) Quit
          Set pCount = 1 + pCount
      }
      Quit pCount
  }

}

 

 

Testing:

    Write  ##class(Test.Test1).RecursiveGlobalCount("^Locations","",0)

    5

    Write  ##class(Test.Test1).RecursiveGlobalCount("^Locations","Canada",0)

    2

    Write ##class(Test.Test1).RecursiveGlobalCount("^Locations","USA",0)
    3



 

The following will count the number of data nodes under a given ^Location(country)

USER>S G1=$NA(^Locations("Canada"))
 USER>S G=$E(G1,1,$L(G1)-1)
USER>W G
^Locations("Canada"
USER>W G1
^Locations("Canada")
USER>F  S G1=$Q(@G1) Q:G1=""!($E(G1,1,$L(G))'=G)  S CT=$I(CT)
USER>W CT
2

 

 

With the bellow code, you will can count all nodes.

USER>S COUNTRY="",COUNT=0,NIV=""
USER>F  S COUNTRY=$O(^Locations(COUNTRY)) Q:COUNTRY=""  F  S NIV=$O(^Locations(COUNTRY,NIV)) Q:NIV=""  S COUNT=COUNT+1
USER>W COUNT
5
 

Count only USA

USER>S COUNTRY="USA",COUNT=0,NIV=""
USER>F  S NIV=$O(^Locations(COUNTRY,NIV)) Q:NIV=""  S COUNT=COUNT+1
USER>W COUNT
3
 

Count only Canada

USER>S COUNTRY="Canada",COUNT=0,NIV=""
USER>F  S NIV=$O(^Locations(COUNTRY,NIV)) Q:NIV=""  S COUNT=COUNT+1
USER>W COUNT
2
 

Hi Flávio,

That works with this particular global, but the method I am trying to create would ideally accept any global with any number of subscripts. 

$Query is the command that will traverse the Global.  Below is my version of the task at hand with testing.  

Class Test.Test1 Extends (%RegisteredObject, %XML.Adaptor)
    {

ClassMethod RecursiveGlobalCount(pGlobalName As %String, pKey As %String, pCount As %Integer) As %Integer
    {
        /// pCount is zero if not provided
       Set pCount = +$Get(pCount)
       Set tKey = $Get(pKey)

       //pKey should only be undefined on first run
       Set tKey = $Query(@pGlobalName@(tKey))
        If (tKey '= "") Set pCount = 1 + pCount // setting tKey got the first node

        For {
            Set tKey = $Query(@tKey)
            If ((tKey = "") || (tKey '[ pKey)) Quit
           Set pCount = 1 + pCount
       }
      Quit pCount
}

}

Testing:

    Write  ##class(Test.Test1).RecursiveGlobalCount("^Locations","",0)

    5

    Write  ##class(Test.Test1).RecursiveGlobalCount("^Locations","Canada",0)

    2

    Write ##class(Test.Test1).RecursiveGlobalCount("^Locations","USA",0)
    3
 

Hi Alan, 

Thanks! That works exactly like I was thinking. Can you change your comment to an answer, or post again as an answer so I can accept it? Thanks again!

When you're using subscript indirection with a recursive $order traversal, you may find the $name function useful; e.g.,

    do ..RecursiveGlobalCount($na(@pGlobalName@(tKey)),"",.tCount)

As the other answers suggest, you probably want $query instead of $order, but $order can be useful for summarizing on multiple subscript levels (e.g., count, min, and max per country, state, and city).

Class community.counter Extends %RegisteredObject
{
/// Example:
/// set ^x(1)=111
/// set ^x(3,5)=222
/// set ^x(3,7)=333
/// 
/// The above global has 5 nodes:
/// ^x without a value
/// ^x(1) with value
/// ^x(3) without a value
/// ^x(3,5) with value
/// ^x(3,7) with value
/// 
/// write ##class(community.counter).CountQ($name(^x)) --> 3
/// write ##class(community.counter).CountR($name(^x)) --> 3
/// 
/// Using your example:
/// write ##class(community.counter).CountQ($name(^Locations)) --> 5
/// write ##class(community.counter).CountQ($name(^Locations("USA")) --> 3
/// 
/// 
/// N.B.
/// Recursion is a tricky thing!
/// It helps one to get a clearly laid out solution
/// but you should take care about runtimes.
/// 
/// CountQ(...) is about 4-5 times faster then CountR(...)
/// 
/// --------------------------------------------------------
/// 
/// Return the count of nodes of a global- or a local variable
/// which have a value, using $QUERY() function
/// 
/// node:
/// a local or global variable, example: $na(^myGloabl), $na(abc)
/// or a local or global reference example: $na(^myGlobal(1,2))
/// 
ClassMethod CountQ(node) As %Integer
{
 if $data(@node)#10 set sum=1 else set sum=0 }
 while 1 set node=$query(@node) quit:node=""  if $increment(sum) }
 quit sum
}

/// Return the count of nodes of a global- or a local variable
/// which have a value, using recursion, using recursion
/// 
/// node:
/// a local or global variable, example: $na(^myGlobal), $na(abc)
/// or a local or global reference example: $na(^myGlobal(1,2))
///       
ClassMethod CountR(node) As %Integer
{
 set sum=0
 do ..nodeCnt($name(@node), .sum)
 quit sum
}

ClassMethod nodeCnt(ref, ByRef sum) As %Integer [ Internal, Private ]
{
 if $data(@ref)#10, $increment(sum)
 set i=""
 while 1 set i=$order(@ref@(i)) quit:i=""  do ..nodeCnt($na(@ref@(i)),.sum) }
}

}

Hi.

 

Julius ConntQ function will give a wrong answer for ^Locations","Canada") since it will count also the "USA" nodes.

 

Here is a code that will do the trick :

ClassMethod Count(node)
{
    S QLen=$QL(node) QLen Keys=$QS(node,QLen)
    F Count=0:1 node=$Query(@node) Q:node="" || (QLen && ($QS(node,QLen)'=Keys))
    Quit Count
}

W ##class(Yaron.test).Count($name(^Locations))
5

w ##class(Yaron.test).Count($name(^Locations("USA")))
3

w ##class(Yaron.test).Count($name(^Locations("Canada")))
2

 

 

Yes, you have right, thank you for the hint.  One never should add an alternate function without testing it!

The correct form is:

ClassMethod CountQ(node) As %Integer
{
end=node
  if $data(@node)#10 set sum=1 else set sum=0 }
  while 1 set node=$query(@node) quit:node=""||($name(@node,$qlength(end))'=end) if $increment(sum) }
  quit sum
}

I noticed that a couple of folks changed the original:
set tCount=tCount+1
to:
if $increment(sum) 

I wondered if that was in fact a performance improvement, so wrote:

 s lim=1000000
 s start=+$p($now(),",",2)
 s count=0
 for i=1:1:lim { s count=count+1 }
 w count_" count=count+1: "_((+$p($now(),",",2))-start)_" seconds",! s start=+$p($now(),",",2)
 s count=0
 for i=1:1:lim { s count=1+count }
 w count_" count=1+count: "_((+$p($now(),",",2))-start)_" seconds",!
 s start=+$p($now(),",",2)
 s count=0
 for i=1:1:lim { if $i(count) } 
 w count_" if $i(count):  "_((+$p($now(),",",2))-start)_" seconds",!

The result is:
1000000 count=count+1: .010256 seconds
1000000 count=1+count: .008554 seconds
1000000 if $i(count):  .024483 seconds

So, "s count=1+count" is a little faster than "s count=count+1", but 3 time faster than "if $i(count)".

set count=count+1 and set count=1+count generate identical object code, so I think we found your margin of error.

if $increment(count) has to set $test, so I would expect it to be slower for a local variable. (I'm not sure about a global.) In IRIS 2018.2 and later, do $increment(count) may close the gap a bit.

Also,  if you use $I() or $seq() for id generation it's not comparable  with count=count+1 cause $I is not reversible in trollbacks

With time measurements keep in mind:

- usually, you are not alone on a Cache server
  There are many other processes, some of them belongs to Cache other to the OS
  
- the time resolution (whatever you use: $now(), $zh) is also limited

- it depends also on the time, how long your mesurement runs (you are not alone!)
 

This is my short testroutine:

Times(iter=1E3,count=4) ; show times

    w ?3,"count   num+1   1+num   =$i()    $i()",!
    w ?15,"times in microseconds",!
    w $tr($j("",40)," ",-1),!
    
    f i=1:1:count d time(iter) s iter=iter*10
    q
    
time(iter)
{
    s f=1E6/iter // factor for "one operation in microseconds"
    
    w $j(iter,8)
    s num=0,t=$zh f i=1:1:iter { s num=num+1 } d t($zh-t*f)
    s num=0,t=$zh f i=1:1:iter { s num=1+num } d t($zh-t*f)
    
    s num=0,t=$zh f i=1:1:iter { s num=$i(num) } d t($zh-t*f)
    s num=0,t=$zh f i=1:1:iter { i $i(num) } d t($zh-t*f)
    w !
}

t(t)
{
    w $j(t,8,3)
}


and this is the output


USER>d ^Times(1,8)
   count   num+1   1+num   =$i()    $i()
               times in microseconds
----------------------------------------
       1   2.000   1.000   2.000   1.000
      10   0.100   0.100   0.100   0.200
     100   0.030   0.030   0.080   0.080
    1000   0.044   0.042   0.088   0.090
   10000   0.028   0.028   0.075   0.077
  100000   0.027   0.027   0.064   0.050
 1000000   0.018   0.014   0.031   0.032
10000000   0.011   0.011   0.031   0.032

USER>d ^Times(1,8)
   count   num+1   1+num   =$i()    $i()
               times in microseconds
----------------------------------------
       1   4.000   0.000   2.000   1.000
      10   0.100   0.100   0.100   0.100
     100   0.040   0.030   0.080   0.580
    1000   0.044   0.041   0.088   0.088
   10000   0.028   0.028   0.075   0.077
  100000   0.027   0.027   0.073   0.076
 1000000   0.027   0.021   0.032   0.032
10000000   0.011   0.011   0.031   0.032

USER>d ^Times(1,8)
   count   num+1   1+num   =$i()    $i()
               times in microseconds
----------------------------------------
       1   3.000   1.000   2.000   1.000
      10   0.100   0.000   0.100   0.100
     100   0.040   0.030   0.080   0.590
    1000   0.045   0.041   0.088   0.090
   10000   0.028   0.028   0.075   0.077
  100000   0.027   0.027   0.073   0.075
 1000000   0.015   0.012   0.031   0.032
10000000   0.011   0.011   0.031   0.032

USER>

USER>

USER>d ^Times(1,8)
   count   num+1   1+num   =$i()    $i()
               times in microseconds
----------------------------------------
       1   3.000   0.000   3.000   1.000
      10   0.100   0.000   0.100   0.100
     100   0.030   0.030   0.080   0.630
    1000   0.046   0.042   0.088   0.090
   10000   0.028   0.028   0.075   0.077
  100000   0.027   0.027   0.073   0.075
 1000000   0.014   0.012   0.032   0.032
10000000   0.011   0.011   0.031   0.032

USER>

I consider time measurements only as a rough approximations

Interesting. I added a loop for if $increment(num) {} (i.e., a new-style if statement that doesn't set $test): no measurable improvement over legacy if.

I also added a loop for do $increment(num) (i.e., a do statement that neither sets $test nor returns a value): ever so slightly slower.

USER>d ^Times(1,8)
   count   num+1   1+num   =$i()    $i()  $i(){}  d $i()
               times in microseconds
--------------------------------------------------------
       1   1.000   0.000   1.000   0.000   0.000   1.000
      10   0.000   0.000   0.100   0.100   0.000   0.000
     100   0.010   0.010   0.490   0.030   0.030   0.040
    1000   0.042   0.011   0.029   0.034   0.029   0.033
   10000   0.011   0.010   0.030   0.032   0.032   0.031
  100000   0.009   0.010   0.030   0.028   0.027   0.031
 1000000   0.009   0.010   0.028   0.028   0.027   0.031
10000000   0.010   0.010   0.028   0.028   0.028   0.031

Incidentally, here are some results with num renamed to ^num:

USER>d ^Times(1,8)
   count   num+1   1+num   =$i()    $i()  $i(){}  d $i()
               times in microseconds
--------------------------------------------------------
       1   2.000   0.000   2.000   1.000   0.000   0.000
      10   0.100   0.200   0.100   0.100   0.100   0.100
     100   1.070   0.280   0.130   0.110   0.100   0.110
    1000   0.142   0.144   0.142   0.102   0.102   0.106
   10000   0.142   0.141   0.110   0.116   0.104   0.108
  100000   0.142   0.141   0.102   0.101   0.100   0.104
 1000000   0.139   0.140   0.100   0.098   0.100   0.102
10000000   0.138   0.138   0.098   0.098   0.099   0.102

For "=$i()", I assigned a local, rather than redundantly assigning the global.

Timings are always variable, but the general trends are clear ("count=1+count" still wins).

I added a $seq test:

 SET $SEQ(^myseq)=1
 for i=1:1:lim if $SEQ(^myseq) }
 count_" if $SEQ(^myseq)): "_((+$p($now(),",",2))-start)_" seconds",!

Results:

1000000 count=count+1:     .010362 seconds
1000000 count=1+count:     .007998 seconds
1000000 if $i(count):      .025006 seconds
1000000 if $SEQ(^myseq)):  .099028 seconds

Do you really think it makes a difference if my routine contains "set xx=xx+1" instead of "set xx=1+xx"?

If yes, try the following:

Times2 ; execution time measurement

  s num=0,t=$zh f i=1:1:1E6 { s num=num+1 } w $j($zh-t,8,6),!
  s num=0,t=$zh f i=1:1:1E6 { s num=num+1 } w $j($zh-t,8,6),!
  q

my output values are

USER>d ^Times2
0.047048
0.038218

USER>d ^Times2
0.034727
0.035160

USER>d ^Times2
0.044252
0.036175

USER>d ^Times2
0.045639
0.035366

Both loops are exactly the same! And now, please explain why the times are partly more than 20% different?

Sorry I was hasty in my judgement of  "num+1".  "1+num" is not faster.

I am not a performance/benchmark expert, but, as noted earlier, timings will vary because the OS is doing other things.

Increasing the loop from 1E6 to 1E7, and repeating/alternating the tests in the program, my laptop was fairly consistent:

+1: 0.204812
1+: 0.201091
+1: 0.201526
1+: 0.207091
+1: 0.20308
1+: 0.201488
+1: 0.202613
1+: 0.202009