Question
· Jul 8, 2016

Best way to compare two arrays for "equality" (same subscripts and values)

I've been trying to write a method to compare two local variables, which may be arrays, for "equality" - that is, to see if they have all the same subscripts (if they're arrays) and values. This is the best I've come up with so far - are there any better/simpler approaches out there?

/// Returns true if arrays <var>first</var> and <var>second</var> have all the same subscripts and all
/// the same values at those subscripts. <br />
/// If <var>first</var> and <var>second</var> both happen to be either undefined or unsubscripted variables,
/// returns true if they're both undefined or have the same value.<br />
/// <var>pMessage</var> has details of the first difference found, if any.
Method CompareArrays(ByRef first, ByRef second, Output pMessage) As %Boolean [ ProcedureBlock = 0 ]
{
    New tEqual,tRef1,tRef2,tRef1Data,tRef1Value,tRef2Data,tRef2Value
    
    Set pMessage = ""
    Set tEqual = 1
    Set tRef1 = "first"
    Set tRef2 = "second"
    While (tRef1 '= "") || (tRef2 '= "") {
        #; See if the subscript is the same for both arrays.
        #; If not, one of them has a subscript the other doesn't, and they're not equal.
        If ($Piece(tRef1,"first",2) '= $Piece(tRef2,"second",2)) {
            Set tEqual = 0
            Set pMessage = "Different subscripts encountered by $Query: "_
                $Case(tRef1,"":"<end>",:tRef1)_"; "_$Case(tRef2,"":"<end>",:tRef2)
            Quit
        }
        
        Kill tRef1Value,tRef2Value
        Set tRef1Data = $Data(@tRef1,tRef1Value)
        Set tRef2Data = $Data(@tRef2,tRef2Value)
        #; See if the $Data values are the same for the two.
        #; This is really only useful to detect if one of the arrays is undefined on the first pass;
        #; $Query only returns subscripts with data.
        #; This will catch only one being defined, or one being an array and
        #; ​the other being a regular variable.
        If (tRef1Data '= tRef2Data) {
            Set tEqual = 0
            Set pMessage = "$Data("_tRef1_")="_tRef1Data_"; $Data("_tRef2_")="_tRef2Data
            Quit
        } ElseIf (tRef1Data#2) && (tRef2Data#2) {
            #; See if the value at the subscript is the same for both arrays.
            #; If not, they're not equal.
            If (tRef1Value '= tRef2Value) {
                Set tEqual = 0
                Set pMessage = tRef1_"="_@tRef1_"; "_tRef2_"="_@tRef2
                Quit
            }
        }
        
        Set tRef1 = $Query(@tRef1)
        Set tRef2 = $Query(@tRef2)
    }
    Quit tEqual
}
Discussion (6)1
Log in or sign up to continue

I dug up a pre-dynamic objects version of a utility method from a REST test and cleaned it up a bit (hopefully not introducing any bugs in the process):

ClassMethod compareArrays(ByRef actual, ByRef expected) As %Status [ PublicList = (actual, expected) ]
{
    ; compare root node
    set deix=$d(expected,eval),daix=$d(actual,aval)
    if deix'=daix {
        quit $$$ERROR($$$GeneralError,"$d(actual)="_daix_" instead of "_deix)
    }
    if deix#2,aval'=eval {
        quit $$$ERROR($$$GeneralError,"actual="""_aval_""" instead of """_eval_"""")
    }

    set status=$$$OK
    set eix="expected",aix="actual"
    for i=1:1 {
        set eix=$q(@eix),aix=$q(@aix)
        quit:""=eix&&(""=aix)

        set seix="("_$p(eix,"(",2,*),saix="("_$p(aix,"(",2,*)
        if seix'=saix {
            set status=$$$ERROR($$$GeneralError,"found """_aix_""" instead of """_eix_""" at position "_i)
            quit
        }

        set deix=$d(@eix,eval),daix=$d(@aix,aval)
        if deix'=daix {
            set status=$$$ERROR($$$GeneralError,"$d(aix)="_daix_" instead of "_deix_" at position "_i)
            quit
        }

        if deix#2,aval'=eval {
            set status=$$$ERROR($$$GeneralError,"actual("""_aix_"""))="""_aval_""" instead of """_eval_""" at position "_i)
            quit
        }
    }
    quit status
}

Comparing them, I only see two things I prefer in my version. First, in this line of your method I would use four-argument $piece with * as the fourth argument, just in case the subscript contains "first" or "second":

    If ($Piece(tRef1,"first",2) '= $Piece(tRef2,"second",2)) {

Second, I would use a public list with first and second, rather than turning off procedure block for the entire method.

Actually I don't see any value for checking $data for intermediate subscript (and check their consistency only at the most beginning of a function). Here is my [hopefully] simpler version

CompareArrays(refL, refR)
    if $data(@refL) '= $data(@refR) {
        // they are not consistent: one is non-array
        return 0
    }
    
    do {
        // fetch next data node subscript and it's value
        set refL = $query(@refL, 1, valueL), refR = $query(@refR, 1, valueR)
        if refL="" || (refR="") {
            quit
        }
        set subL = $qlength(refL), subR = $qlength(refR)

        if subL'=subR || (valueL '= valueR) {
            return 0
        }
        // check each subscipt individually
        for i=1:1:subL {
            if $qsubscript(refL, i) '= $qsubscript(refR, i) {
                return 0
            }
        }
    while refL'="" && (refR'="")

    // only after all checks passed
    return refL=refR


DebugArrayCompare()
    new
    set m(1,1,1)=11,m(1,2)=12,m(2,1)=133
    set n(1,1,1)=11,n(1,2)=12,n(2,1)=133
    write $$CompareArrays($name(m),$name(n)),!
    set n(3,1)=0
    write $$CompareArrays($name(m),$name(n)),!
    quit
 

Here's the one I thought up. Does not use indirection.

/// Returns true if arrays <var>pFirst</var> and <var>pSecond</var> have all the same subscripts and all
/// the same values at those subscripts. <br />
/// If <var>pFirst</var> and <var>pSecond</var> both happen to be either undefined or unsubscripted variables
/// returns true if they're both undefined or have the same value or one is undefined and the other empty
/// <var>pMessage</var> has details of the first difference found, if any.
ClassMethod CompareArrays2(ByRef pFirst, ByRef pSecond, Output pMessage) As %Boolean
{
    Set pMessage = ""
    Return:(($Data(pFirst) '= 10) || ($Data(pSecond) '= 10)) $Get(pFirst) = $Get(pSecond)
    Merge First = pFirst
    Merge Second = pSecond
    Set Key = $Order(First(""))
    
    /// Iterate over first array
    While (Key '= "") {
        
        /// $Data on undefined var does not modify second argument
        Kill SecondVal
        
        /// If the second array does not have the same subscript
        /// or the values are different, quit
        If (($Data(Second(Key), SecondVal) = 0) || ($Get(SecondVal) '= First(Key))) {
            Set pMessage = "Different subscripts at " _ Key
            Return $$$NO
        } Else {
            /// Otherwise remove this element from the second array
            /// In here: Second(Key) = First(Key)
            Kill Second(Key)
        }
        Set Key = $Order(First(Key))
    }
    
    /// Second array should have no subscripts
    /// If there are any, that means they are not present
    /// in the first array, and so different
    If $Data(Second) = 10 {
        Set pMessage = "Different subscripts at " _ $Order(Second(""))
        Return $$$NO        
    }
    
    Return $$$YES
}

How is "best" defined here? If you wan't fastest, and shortest, I have two options for you. This following code also works with both locals and globals, and avoids the bug of using $PIECE() to trim off the global/local name which won't work on globals which contain a "(" in their namespace (admittedly unlikely).

This is the fast version:

        ; $$FDQ($NAME(a),$NAME(b))
        ;       Find first different nodes in two trees (or subtrees). Will
        ;       work with locals or globals, except locals of the form % or
        ;       %<digit>. Returns a string containing the two references where
        ;       the first difference separated by "'=". If a node is found in
        ;       one tree that is not present in the other, the missing
        ;       reference is replaced by a question mark ("?"). If both trees
        ;       are the same, an empty string is returned. 
        ;
FDQ(%1,%2)      ;                                                          [10]
        NEW %3,%4,%5,%6,%7,%8,%9,%0,% ;                                    [20]
        SET %3=$DATA(@%1,%5)#10,%4=$DATA(@%2,%6)#10
        QUIT:%3'=%4||(%3&&(%5'=%6)) $S(%3:%1,1:"?")_"'="_$S(%2:b,1:"?") ;  [30]
        SET %7=%1,%8=%2,%3=$QLENGTH(%1),%4=$QLENGTH(%2)
lq      SET %1=$QUERY(@%1,1,%5),%2=$QUERY(@%2,1,%6) ;                      [40]
        SET:%1'=""&&(%7'=$NAME(@%1,%3)) %1="" ;                            [50]
        SET:%2'=""&&(%8'=$NAME(@%2,%4)) %2=""
        QUIT:%1="" $SELECT(%2="":"",1:"?'="_%2) QUIT:%2="" %1_"'=?" ;      [60]
        FOR %=1:1 SET %9=$QS(%1,%3+%),%0=$QS(%2,%4+%) Q:%9'=%0  Q:%9="" ;  [70]
        IF %9="",%0="" GOTO:%5=%6 lq QUIT %1_"'="_%2 ;                     [80]
        QUIT:%9]]%0 "?'="_%2 QUIT %1_"'=?" ;                               [90]
        ; ------------
        ; [10]  %1,%2 Reference to nodes under test.
        ; [20]  %3,%4 Upto [30] used for Do %1,%2 exist (respectively)?
        ;             After [30] used for count of subscripts of %1,%2.
        ;       %5,%6 Values of %1,%2.
        ;       %7,%8 Copies of %1,%2 used to help find end subtree.
        ;       %9,%0 First different subscript of %1,%2.
        ;       %     Loop index for scanning down subscript list.
        ; [30]  Return if the existence of %1 and %2 differ or if either exist
        ;       (doesn't matter which), and the values differ.
        ; [40]  Go to next node on each side (which we know exist).
        ; [50]  Check if we have moved past the end of the subtree.
        ; [60]  If either or both %1,%2 put us at end of subtree, return.
        ; [70]  Find the first different subscript or both will be "".
        ; [80]  If both final subscripts "", subscripts are the same so check
        ;       values, and either return of loop.
        ; [90]  Subscripts don't match, return determine order so we can return
        ;       node that is missing.
 
This version may take 30% longer in my test runs, but is a lot simpler by using recursion:
        ; $$FDR($NAME(a),$NAME(b))
        ;       Find first different nodes in two trees (or subtrees). Will
        ;       work with locals or globals, except locals of the for %, %1,
        ;       %2, %3, or %4. Returns a string containing the two references
        ;       where the first difference separated by "'=". If a node is
        ;       found in one tree that is not present in the other, the missing
        ;       reference is replaced by a question mark ("?"). If both trees
        ;       are the same, an empty string is returned. 
        ;
FDR(%1,%2)      ;                                                          [10]
        NEW %3,%4,% ;                                                      [20]
        SET %3=$DATA(@%1,%5)#10,%4=$DATA(@%2,%6)#10
        QUIT:%3'=%4||(%3&&(%5'=%6)) $S(%3:%1,1:"?")_"'="_$S(%2:b,1:"?") ;  [30]
        SET (%3,%4)=""
lr      SET %3=$ORDER(@%1@(%3)),%4=$ORDER(@%2@(%4)) Q:%3=""&&(%4="") "" ;  [40]
        IF %3=%4 SET %=$$FDR($NA(@%1@(%3)),$NA(@%2@(%4))) G:%="" lr Q % ;  [50]
        QUIT:%3]]%4 "?'="_$NAME(@%2@(%4)) QUIT $NAME(@%1@(%3))_"'=?" ;     [60]
        ; ------------
        ; [10]  %1,%2 Reference to nodes under test.
        ; [20]  %3,%4 Upto [30] used for Do %1,%2 exist (respectively)?
        ;             After [30] Subscripts of %1,%2.
        ;       %     Results of recursive call.
        ; [30]  Return if the existence of %1 and %2 differ or if either exist
        ;       (doesn't matter which), and the values differ.
        ; [40]  Go to next subscript at this level.
        ; [50]  If the subscripts are the same, check the sub-tree
        ;       recursively. Loop or quit, depending upon finding a difference.
        ; [60]  If subscripts differ, there is a missing node. Return the
        ;       missing one.

Not tested for speed, while I expect this version should be rather fast as it compares common parts of both references rather than individual suscripts. Enjoy!

tttcmp(fgname,tgname,bKill,nErrTotal,nErrTop) ; Compare [sub]array @fgname with [sub]array @tgname
;In:
; fgname - "original" [sub]array
; tgname - its copy to check with;
; bKill - kill @tgname if it matches to @fgname (default = 0)
; nErrTop - # of mismatches to find to stop comparison
;
;Out:
; returns 1 on full subscripts and data match, else - 0.
; ByRef nErrTotal - # of mismatches.
;
new x,y,xtop,ytop,i,flOK,flQ,xquit,yquit,nErr,xstart,ystart
set bKill=$get(bKill,0)
set nErrTop=$get(nErrTop,1)
set x=fgname,y=tgname write !,"Comparing original "_fgname_" with imported "_tgname_":"
set xstart=$length($name(@x,$qlength(x)))+$select($qlength(x):1,1:2)
set xtop=$select($qlength(x):$extract(x,1,$length(x)-1)_",",1:x)
set ystart=$length($name(@y,$qlength(y)))+$select($qlength(y):1,1:2)
set ytop=$select($qlength(y):$extract(y,1,$length(y)-1)_",",1:y)
set flOK=1,flQ=0,nErr=0,nErrTotal=0
for i=1:1 do  quit:flQ
. set x=$query(@x),xquit=x=""!(x'[xtop)
. set y=$query(@y),yquit=y=""!(y'[ytop)
. if xquit,yquit write " OK. i=",set flQ=1 quit
. if xquit!yquit write " NO!!!: i=",i,$select(xquit:" "_fgname_" is shorter than "_tgname,1:" "_tgname_" is shorter than "_fgname) set nErrTotal=nErrTotal+1,flOK=0,flQ=1 quit
. if $extract(x,xstart,$length(x))'=$extract(y,ystart,$length(y)) write !,"!!! Ref NEQ: i=",write !," x=",x,!," y=",y  set nErrTotal=nErrTotal+1,nErr=nErr+1,flOK=0 set:nErr'<nErrTop flQ=1 quit:flQ  ;!,$e(x,xstart,$l(x)),!,$e(y,ystart,$l(y)),
. if $get(@x)'=$get(@y) write !,"!!! Data NEQ: i=",write !," *** x = ",x,!," x => ",@x,!," *** y = ",y,!," @y => ",@set nErrTotal=nErrTotal+1,nErr=nErr+1,flOK=0 set:nErr'<nErrTop flQ=1 quit:flQ
. else  set nErr=0
if flOK,bKill write !,"Killing "_tgname_"..." kill @tgname
else  write !,"Not Killing "_tgname
quit flOK