﻿ Best way to compare two arrays for "equality" (same subscripts and values)

# Best way to compare two arrays for "equality" (same subscripts and values)

I've been trying to write a method to compare two local variables, which may be arrays, for "equality" - that is, to see if they have all the same subscripts (if they're arrays) and values. This is the best I've come up with so far - are there any better/simpler approaches out there?

``````/// Returns true if arrays <var>first</var> and <var>second</var> have all the same subscripts and all
/// the same values at those subscripts. <br />
/// If <var>first</var> and <var>second</var> both happen to be either undefined or unsubscripted variables,
/// returns true if they're both undefined or have the same value.<br />
/// <var>pMessage</var> has details of the first difference found, if any.
Method CompareArrays(ByRef first, ByRef second, Output pMessage) As %Boolean [ ProcedureBlock = 0 ]
{
New tEqual,tRef1,tRef2,tRef1Data,tRef1Value,tRef2Data,tRef2Value

Set pMessage = ""
Set tEqual = 1
Set tRef1 = "first"
Set tRef2 = "second"
While (tRef1 '= "") || (tRef2 '= "") {
#; See if the subscript is the same for both arrays.
#; If not, one of them has a subscript the other doesn't, and they're not equal.
If (\$Piece(tRef1,"first",2) '= \$Piece(tRef2,"second",2)) {
Set tEqual = 0
Set pMessage = "Different subscripts encountered by \$Query: "_
\$Case(tRef1,"":"<end>",:tRef1)_"; "_\$Case(tRef2,"":"<end>",:tRef2)
Quit
}

Kill tRef1Value,tRef2Value
Set tRef1Data = \$Data(@tRef1,tRef1Value)
Set tRef2Data = \$Data(@tRef2,tRef2Value)
#; See if the \$Data values are the same for the two.
#; This is really only useful to detect if one of the arrays is undefined on the first pass;
#; \$Query only returns subscripts with data.
#; This will catch only one being defined, or one being an array and
#; ​the other being a regular variable.
If (tRef1Data '= tRef2Data) {
Set tEqual = 0
Set pMessage = "\$Data("_tRef1_")="_tRef1Data_"; \$Data("_tRef2_")="_tRef2Data
Quit
} ElseIf (tRef1Data#2) && (tRef2Data#2) {
#; See if the value at the subscript is the same for both arrays.
#; If not, they're not equal.
If (tRef1Value '= tRef2Value) {
Set tEqual = 0
Set pMessage = tRef1_"="_@tRef1_"; "_tRef2_"="_@tRef2
Quit
}
}

Set tRef1 = \$Query(@tRef1)
Set tRef2 = \$Query(@tRef2)
}
Quit tEqual
}```
```
• + 1
• 0
• 682
• 0
• 6

I dug up a pre-dynamic objects version of a utility method from a REST test and cleaned it up a bit (hopefully not introducing any bugs in the process):

``````ClassMethod compareArrays(ByRef actual, ByRef expected) As %Status [ PublicList = (actual, expected) ]
{
; compare root node
set deix=\$d(expected,eval),daix=\$d(actual,aval)
if deix'=daix {
}
if deix#2,aval'=eval {
}

set status=\$\$\$OK
set eix="expected",aix="actual"
for i=1:1 {
set eix=\$q(@eix),aix=\$q(@aix)
quit:""=eix&&(""=aix)

set seix="("_\$p(eix,"(",2,*),saix="("_\$p(aix,"(",2,*)
if seix'=saix {
set status=\$\$\$ERROR(\$\$\$GeneralError,"found """_aix_""" instead of """_eix_""" at position "_i)
quit
}

set deix=\$d(@eix,eval),daix=\$d(@aix,aval)
if deix'=daix {
set status=\$\$\$ERROR(\$\$\$GeneralError,"\$d(aix)="_daix_" instead of "_deix_" at position "_i)
quit
}

if deix#2,aval'=eval {
set status=\$\$\$ERROR(\$\$\$GeneralError,"actual("""_aix_"""))="""_aval_""" instead of """_eval_""" at position "_i)
quit
}
}
quit status
}
``````

Comparing them, I only see two things I prefer in my version. First, in this line of your method I would use four-argument \$piece with * as the fourth argument, just in case the subscript contains "first" or "second":

``````    If (\$Piece(tRef1,"first",2) '= \$Piece(tRef2,"second",2)) {
``````

Second, I would use a public list with first and second, rather than turning off procedure block for the entire method.

Actually I don't see any value for checking \$data for intermediate subscript (and check their consistency only at the most beginning of a function). Here is my [hopefully] simpler version

CompareArrays(refL, refR)
if \$data(@refL) '= \$data(@refR) {
// they are not consistent: one is non-array
return 0
}

do {
// fetch next data node subscript and it's value
set refL = \$query(@refL, 1, valueL), refR = \$query(@refR, 1, valueR)
if refL="" || (refR="") {
quit
}
set subL = \$qlength(refL), subR = \$qlength(refR)

if subL'=subR || (valueL '= valueR) {
return 0
}
// check each subscipt individually
for i=1:1:subL {
if \$qsubscript(refL, i) '= \$qsubscript(refR, i) {
return 0
}
}
while refL'="" && (refR'="")

// only after all checks passed
return refL=refR

DebugArrayCompare()
new
set m(1,1,1)=11,m(1,2)=12,m(2,1)=133
set n(1,1,1)=11,n(1,2)=12,n(2,1)=133
write \$\$CompareArrays(\$name(m),\$name(n)),!
set n(3,1)=0
write \$\$CompareArrays(\$name(m),\$name(n)),!
quit

Here's the one I thought up. Does not use indirection.

``````/// Returns true if arrays <var>pFirst</var> and <var>pSecond</var> have all the same subscripts and all
/// the same values at those subscripts. <br />
/// If <var>pFirst</var> and <var>pSecond</var> both happen to be either undefined or unsubscripted variables
/// returns true if they're both undefined or have the same value or one is undefined and the other empty
/// <var>pMessage</var> has details of the first difference found, if any.
ClassMethod CompareArrays2(ByRef pFirst, ByRef pSecond, Output pMessage) As %Boolean
{
Set pMessage = ""
Return:((\$Data(pFirst) '= 10) || (\$Data(pSecond) '= 10)) \$Get(pFirst) = \$Get(pSecond)
Merge First = pFirst
Merge Second = pSecond
Set Key = \$Order(First(""))

/// Iterate over first array
While (Key '= "") {

/// \$Data on undefined var does not modify second argument
Kill SecondVal

/// If the second array does not have the same subscript
/// or the values are different, quit
If ((\$Data(Second(Key), SecondVal) = 0) || (\$Get(SecondVal) '= First(Key))) {
Set pMessage = "Different subscripts at " _ Key
Return \$\$\$NO
} Else {
/// Otherwise remove this element from the second array
/// In here: Second(Key) = First(Key)
Kill Second(Key)
}
Set Key = \$Order(First(Key))
}

/// Second array should have no subscripts
/// If there are any, that means they are not present
/// in the first array, and so different
If \$Data(Second) = 10 {
Set pMessage = "Different subscripts at " _ \$Order(Second(""))
Return \$\$\$NO
}

Return \$\$\$YES
}``````

How is "best" defined here? If you wan't fastest, and shortest, I have two options for you. This following code also works with both locals and globals, and avoids the bug of using \$PIECE() to trim off the global/local name which won't work on globals which contain a "(" in their namespace (admittedly unlikely).

This is the fast version:

```        ; \$\$FDQ(\$NAME(a),\$NAME(b))
;       Find first different nodes in two trees (or subtrees). Will
;       work with locals or globals, except locals of the form % or
;       %<digit>. Returns a string containing the two references where
;       the first difference separated by "'=". If a node is found in
;       one tree that is not present in the other, the missing
;       reference is replaced by a question mark ("?"). If both trees
;       are the same, an empty string is returned.
;
FDQ(%1,%2)      ;                                                          
NEW %3,%4,%5,%6,%7,%8,%9,%0,% ;                                    
SET %3=\$DATA(@%1,%5)#10,%4=\$DATA(@%2,%6)#10
QUIT:%3'=%4||(%3&&(%5'=%6)) \$S(%3:%1,1:"?")_"'="_\$S(%2:b,1:"?") ;  
SET %7=%1,%8=%2,%3=\$QLENGTH(%1),%4=\$QLENGTH(%2)
lq      SET %1=\$QUERY(@%1,1,%5),%2=\$QUERY(@%2,1,%6) ;                      
SET:%1'=""&&(%7'=\$NAME(@%1,%3)) %1="" ;                            
SET:%2'=""&&(%8'=\$NAME(@%2,%4)) %2=""
QUIT:%1="" \$SELECT(%2="":"",1:"?'="_%2) QUIT:%2="" %1_"'=?" ;      
FOR %=1:1 SET %9=\$QS(%1,%3+%),%0=\$QS(%2,%4+%) Q:%9'=%0  Q:%9="" ;  
IF %9="",%0="" GOTO:%5=%6 lq QUIT %1_"'="_%2 ;                     
QUIT:%9]]%0 "?'="_%2 QUIT %1_"'=?" ;                               
; ------------
;   %1,%2 Reference to nodes under test.
;   %3,%4 Upto  used for Do %1,%2 exist (respectively)?
;             After  used for count of subscripts of %1,%2.
;       %5,%6 Values of %1,%2.
;       %7,%8 Copies of %1,%2 used to help find end subtree.
;       %9,%0 First different subscript of %1,%2.
;       %     Loop index for scanning down subscript list.
;   Return if the existence of %1 and %2 differ or if either exist
;       (doesn't matter which), and the values differ.
;   Go to next node on each side (which we know exist).
;   Check if we have moved past the end of the subtree.
;   If either or both %1,%2 put us at end of subtree, return.
;   Find the first different subscript or both will be "".
;   If both final subscripts "", subscripts are the same so check
;       values, and either return of loop.
;   Subscripts don't match, return determine order so we can return
;       node that is missing.
```

This version may take 30% longer in my test runs, but is a lot simpler by using recursion:
```        ; \$\$FDR(\$NAME(a),\$NAME(b))
;       Find first different nodes in two trees (or subtrees). Will
;       work with locals or globals, except locals of the for %, %1,
;       %2, %3, or %4. Returns a string containing the two references
;       where the first difference separated by "'=". If a node is
;       found in one tree that is not present in the other, the missing
;       reference is replaced by a question mark ("?"). If both trees
;       are the same, an empty string is returned.
;
FDR(%1,%2)      ;                                                          
NEW %3,%4,% ;                                                      
SET %3=\$DATA(@%1,%5)#10,%4=\$DATA(@%2,%6)#10
QUIT:%3'=%4||(%3&&(%5'=%6)) \$S(%3:%1,1:"?")_"'="_\$S(%2:b,1:"?") ;  
SET (%3,%4)=""
lr      SET %3=\$ORDER(@%1@(%3)),%4=\$ORDER(@%2@(%4)) Q:%3=""&&(%4="") "" ;  
IF %3=%4 SET %=\$\$FDR(\$NA(@%1@(%3)),\$NA(@%2@(%4))) G:%="" lr Q % ;  
QUIT:%3]]%4 "?'="_\$NAME(@%2@(%4)) QUIT \$NAME(@%1@(%3))_"'=?" ;     
; ------------
;   %1,%2 Reference to nodes under test.
;   %3,%4 Upto  used for Do %1,%2 exist (respectively)?
;             After  Subscripts of %1,%2.
;       %     Results of recursive call.
;   Return if the existence of %1 and %2 differ or if either exist
;       (doesn't matter which), and the values differ.
;   Go to next subscript at this level.
;   If the subscripts are the same, check the sub-tree
;       recursively. Loop or quit, depending upon finding a difference.
;   If subscripts differ, there is a missing node. Return the
;       missing one.
```

Not tested for speed, while I expect this version should be rather fast as it compares common parts of both references rather than individual suscripts. Enjoy!

```tttcmp(fgname,tgname,bKill,nErrTotal,nErrTop) ; Compare [sub]array @fgname with [sub]array @tgname
;In:
; fgname - "original" [sub]array
; tgname - its copy to check with;
; bKill - kill @tgname if it matches to @fgname (default = 0)
; nErrTop - # of mismatches to find to stop comparison
;
;Out:
; returns 1 on full subscripts and data match, else - 0.
; ByRef nErrTotal - # of mismatches.
;
new x,y,xtop,ytop,i,flOK,flQ,xquit,yquit,nErr,xstart,ystart
set bKill=\$get(bKill,0)
set nErrTop=\$get(nErrTop,1)
set x=fgname,y=tgname write !,"Comparing original "_fgname_" with imported "_tgname_":"
set xstart=\$length(\$name(@x,\$qlength(x)))+\$select(\$qlength(x):1,1:2)
set xtop=\$select(\$qlength(x):\$extract(x,1,\$length(x)-1)_",",1:x)
set ystart=\$length(\$name(@y,\$qlength(y)))+\$select(\$qlength(y):1,1:2)
set ytop=\$select(\$qlength(y):\$extract(y,1,\$length(y)-1)_",",1:y)
set flOK=1,flQ=0,nErr=0,nErrTotal=0
for i=1:1 do  quit:flQ
. set x=\$query(@x),xquit=x=""!(x'[xtop)
. set y=\$query(@y),yquit=y=""!(y'[ytop)
. if xquit,yquit write " OK. i=",i set flQ=1 quit
. if xquit!yquit write " NO!!!: i=",i,\$select(xquit:" "_fgname_" is shorter than "_tgname,1:" "_tgname_" is shorter than "_fgname) set nErrTotal=nErrTotal+1,flOK=0,flQ=1 quit
. if \$extract(x,xstart,\$length(x))'=\$extract(y,ystart,\$length(y)) write !,"!!! Ref NEQ: i=",i write !," x=",x,!," y=",y  set nErrTotal=nErrTotal+1,nErr=nErr+1,flOK=0 set:nErr'<nErrTop flQ=1 quit:flQ  ;!,\$e(x,xstart,\$l(x)),!,\$e(y,ystart,\$l(y)),
. if \$get(@x)'=\$get(@y) write !,"!!! Data NEQ: i=",i write !," *** x = ",x,!," x => ",@x,!," *** y = ",y,!," @y => ",@y set nErrTotal=nErrTotal+1,nErr=nErr+1,flOK=0 set:nErr'<nErrTop flQ=1 quit:flQ
. else  set nErr=0
if flOK,bKill write !,"Killing "_tgname_"..." kill @tgname
else  write !,"Not Killing "_tgname
quit flOK```

One more hint from Russian forum:

%GCMP  - Compares two globals in the same or different namespace.