Question
· Feb 17, 2023

Global View and $ORDER not returning expected search results, (or what I want it to).

I know this problem something to do with sort and collation and string comparison vs numeric only strings, but I can't fathom out the details of it. 

I have the following global, where all values are suffixed with a space, I believe the intention is to prevent empty strings and force string comparisons rather than numeric. Which is fine when searching for an exact match, so searching for "ABC" internally looks for "ABC ", or "800999" looks for "800999 ". However one function tries to search for strings containing, (specifically beginning with), a partial code. So in the below data the example could be looking for prefixes 8009. 

Trying that in management portal using either "8009" or "8009 " gives different results and I don't fully understand how "8009 ": matches "800999 " but "8009": doesn't. I'm assuming that a pure number sorts before any strings, which means "8009" matches the string " " and if I kept paging through the entire data I'd come to the data I wanted. But that's loading and returning far more data than I want to process. 

I can limit the data manually by putting in an end range on management portal because I can look at the data and assess how to end range it.

however if I was to do that with a $O in code where the search prefix is user supplied I can't calculate the end range in code to limit the options. So I end up having to search through the entire global from the point of the match forwards. (I mean I suppose I could calculate ascii values of end characters and increment etc or TR replace the next character but I don't want to be getting into that sort of messy code unless I absolutely have to).

This is my test fragment of code where I found the error manifesting. 

findPOs
// test params
^WK3SORT($J)
^WK3SORT($J,"800999 ")=1
^WK3SORT($J,"800999")=1
^WK3SORT($J,"8009")=1
^WK3SORT($J,"8009 ")=1
%C=1
loopCountTest=0
loopCountSearch=0
"Testing PO Search",! SORT3 //Find all purchase orders that starts with the target string
POREF,LOOPREF
LOOPREF=""
S3L1 LOOPREF=$O(^WK3SORT($J,LOOPREF)) G:LOOPREF="" S3L1X
loopCountSearch=0
loopCountTest=loopCountTest+1
"Searching for :@"_LOOPREF_"@: ",!
(TargetPORef,POREF)=$$SORTCASE^UTL00EXT(LOOPREF) //_" "
S3L2 POREF=$O(^REXREF3(%C,POREF)) G:POREF="" S3L2X 
loopCountSearch=loopCountSearch+1
// If its not a match, continue searching
G:($E(POREF,1,$L(TargetPORef))'=TargetPORef) S3L2
"Match: "_POREF_" CONNO: "
CONNO=""
S3L3 CONNO=$O(^REXREF3(%C,POREF,CONNO)) G:CONNO="" S3L3X 
CONNO_","
S3L3
S3L3X
!
S3L2
S3L2X //
"Searched : " _ loopCountSearch,!
S3L1
S3L1X //
"Tested : " _ loopCountTest,!
Q

You can see below that the output is correct, however the searched number is huge, (for the number of records I need). 

Previously the function above was as follows: 

S3L2 POREF=$O(^REXREF3(%C,POREF)) G:POREF=""!($E(POREF,1,$L(TargetPORef))'=TargetPORef) S3L2X

where the string match exited the loop if the first results weren't a string match. The problem there is that this didn't work for fully numeric values, which is why I had to adapt it to the function as seen above.  I tried just adding the training space onto the search parameter but that didn't return any values at all (you can see where its commented out in the function). 

So, dear hive mind, what obvious piece of information am I missing? 

Product version: Caché 2018.1
$ZV: 2018.1.4.505.1
Discussion (12)2
Log in or sign up to continue

Thanks Robert. 

I understand that the value is being treated as numeric.  I couldn't get my head around how "8009 " matches "800999 " because the space should invalidate the match, but I've just realised that that space is a lower ascii value than the numerics and alpha characters. 

I don't understand your suggestion about using $QSUBSCRIPT, could you possibly provide a line of code to show how you're suggesting it would give me a range end. 

this shows how $QS could be used for a partial global listing
it only checks the second subscript and writes the value for a match

SAMPLES>k ^rcc
SAMPLES>f i=12:1:19 s ^rcc($j,i)="SUB1=J,"_i
SAMPLES>f i=12:1:15 f j=14,17 s ^rcc($j,j,i)="SUB1="_j_",SUB2="_i
 
SAMPLES>zw ^rcc
^rcc(9372,12)="SUB1=J,12"
^rcc(9372,13)="SUB1=J,13"
^rcc(9372,14)="SUB1=J,14"
^rcc(9372,14,12)="SUB1=14,SUB2=12"
^rcc(9372,14,13)="SUB1=14,SUB2=13"
^rcc(9372,14,14)="SUB1=14,SUB2=14"
^rcc(9372,14,15)="SUB1=14,SUB2=15"
^rcc(9372,15)="SUB1=J,15"
^rcc(9372,16)="SUB1=J,16"
^rcc(9372,17)="SUB1=J,17"
^rcc(9372,17,12)="SUB1=17,SUB2=12"
^rcc(9372,17,13)="SUB1=17,SUB2=13"
^rcc(9372,17,14)="SUB1=17,SUB2=14"
^rcc(9372,17,15)="SUB1=17,SUB2=15"
^rcc(9372,18)="SUB1=J,18"
^rcc(9372,19)="SUB1=J,19"
 
set glob="^rcc",last=14,start=13

SAMPLES>print
a        for  {
         set glob=$q(@glob,1,val) quit:glob=""
         set sub=$qs(glob,3)
         if sub=start write glob," >> ",val,!
         else  if sub]start,sub']last write glob," >> ",val,!
         else  write glob,!
         }

SAMPLES>d a
^rcc(9372,12)
^rcc(9372,13)
^rcc(9372,14)
^rcc(9372,14,12)
^rcc(9372,14,13) >> SUB1=14,SUB2=13
^rcc(9372,14,14) >> SUB1=14,SUB2=14
^rcc(9372,14,15)
^rcc(9372,15)
^rcc(9372,16)
^rcc(9372,17)
^rcc(9372,17,12)
^rcc(9372,17,13) >> SUB1=17,SUB2=13
^rcc(9372,17,14) >> SUB1=17,SUB2=14
^rcc(9372,17,15)
^rcc(9372,18)
^rcc(9372,19)
 

Thanks Robert

I still don't see how that addresses my question, but I understand what you're suggesting now. 
You're giving it the last value of the search, and you're also still looping through the entire global.  The first I don't have unless I calculate it and the second is what I'm trying to avoid.

I think my solution may lie in forcing the numeric search prefix into a string and going from there.  I'll have to go back and play. As is the oft the case, writing it all down to explain it has clarified a few things for me :)

Hi Mark,

When looping through numerics, unless you intervene, it will loop through every number in magnitude order until you reach the end. So, if you start at 8009 and 8010 exists then it will find it. And 8009 is not next to 800900001 by a long way.

When looping through strings, it's going to go through them like a human would when going through a dictionary. All words with the same prefix are next to each other. So you can easily stop your loop immediately when you find a word that doesn't match.

This code will probably do what you want, athough I haven't tested it:

Noddy ;; Find all subscripts with numeric prefix using minimum $ORDER
    ;
Find(Prefix=8009,MaxSubscriptlength=12)
    Prefix=+Prefix {
        // Must be numeric
        MatchLength=$l(Prefix)
        I=MatchLength:1:MaxSubscriptLength {
            // try starting from every number beginning with
            // 8009, thru 80090, 800900, up to say 800900000000
            // or whatever the maximum subscript length is
            Start=Prefix_$E("000000000000",0,I-MatchLength)
            F  {
                // see if we found one, even at the start of the loop
                $D(^REXREF3(1,Start)) {
                    // but it must have our prefix
                    // this test can end the loop!
                    Q:$E(Start,1,MatchLength)'=Prefix
                    // but only looking for numerics in this loop
                    // don't want to find again later
                    Q:Start'=+Start
                    // Now do whatever it is you want to do with it..
                    // <REMOVE MINE AND INSERT YOUR CODE HERE>
                    !,Start
                }
                Start=$O(^REXREF3(1,Start))
                Q:Start=""
            }
        }
    }
    // Now look for strings with that prefix
    // all strings starting with 8009 will immediately follow "8009 "
    // so start there, there won't be an numeric subscripts following a string
    Start=Prefix
    MatchLength=$l(Prefix)
    Prefix=+Prefix Start=Start_" "
    F  {
        // see if we found one, even at the start of the loop
        $D(^REXREF3(1,Start)) {
            // but it must have our prefix
            // this test can end the loop!
            Q:$E(Start,1,MatchLength)'=Prefix
            // Now do whatever it is you want to do with it..
            // <REMOVE MINE AND INSERT YOUR CODE HERE>
            !,Start
        }
        Start=$O(^REXREF3(1,Start))
        Q:Start=""
    }
    q
 

Hi Mark,

 in my experience mixed indexes (both with pure numbers and strings) are not a good choice if you need to perform something like a "STARTSWITH" function. 

In order to be able to always find your matching records near one to another and speed up the search, I think you have to normalize all the values by adding the space suffix to every code, assuming that values with and without space at the end refer to the same entity.
This way you can easily search for all the occurrence of a code that is starting with a specified prefix, without reading extra lines from the global.

Also you have to add a space at the prefix variable when searching if it is not present at the end of the passed value.

But if i look at your first screen, looks like you have no pure numbers in the second subscript of the global at the moment so I do not get the point.. 

Following you can find some of my tests:

d INIT^TESTRTN
^WK3SORT(1,0,9999)=""
^WK3SORT(1,801,307077)=""
^WK3SORT(1,802,306650)=""
^WK3SORT(1,800999,311404)=""
^WK3SORT(1," ",43017)=""
^WK3SORT(1," ",161692)=""
^WK3SORT(1," ",161693)=""
^WK3SORT(1," ",209576)=""
^WK3SORT(1,"80094NMGM ",528126)=""
^WK3SORT(1,"800999 ",305403)=""
^WK3SORT(1,"800999 ",305404)=""
^WK3SORT(1,"800999 ",311403)=""
^WK3SORT(1,"800999/T51316 ",364551)=""
^WK3SORT(1,"800999CAMDEN ",356252)=""
^WK3SORT(1,"801 ",307076)=""
^WK3SORT(1,"801 ",555780)=""
^WK3SORT(1,"80110 ",479476)=""
^WK3SORT(1,"802 ",306649)=""

d NORMALIZE^TESTRTN(1)
4 Lines Converted
^WK3SORT(1," ",43017)=""
^WK3SORT(1," ",161692)=""
^WK3SORT(1," ",161693)=""
^WK3SORT(1," ",209576)=""
^WK3SORT(1,"0 ",9999)=""
^WK3SORT(1,"80094NMGM ",528126)=""
^WK3SORT(1,"800999 ",305403)=""
^WK3SORT(1,"800999 ",305404)=""
^WK3SORT(1,"800999 ",311403)=""
^WK3SORT(1,"800999 ",311404)=""
^WK3SORT(1,"800999/T51316 ",364551)=""
^WK3SORT(1,"800999CAMDEN ",356252)=""
^WK3SORT(1,"801 ",307076)=""
^WK3SORT(1,"801 ",307077)=""
^WK3SORT(1,"801 ",555780)=""
^WK3SORT(1,"80110 ",479476)=""
^WK3SORT(1,"802 ",306649)=""
^WK3SORT(1,"802 ",306650)=""

d FIND^TESTRTN(1,"8009 ")
Order: 1-80094NMGM
  Detail: 528126
Order: 1-800999
  Detail: 305403
  Detail: 305404
  Detail: 311403
  Detail: 311404
Order: 1-800999/T51316
  Detail: 364551
Order: 1-800999CAMDEN
  Detail: 356252
 
5 lines read
TESTRTN
INIT
	//Create some data
	Kill ^WK3SORT
	//Strings
	Set ^WK3SORT(1," ",43017)=""
	Set ^WK3SORT(1," ",161692)=""
	Set ^WK3SORT(1," ",161693)=""
	Set ^WK3SORT(1," ",209576)=""
	Set ^WK3SORT(1,"80094NMGM ",528126)=""
	Set ^WK3SORT(1,"800999 ",305403)=""
	Set ^WK3SORT(1,"800999 ",305404)=""
	Set ^WK3SORT(1,"800999 ",311403)=""
	Set ^WK3SORT(1,"800999/T51316 ",364551)=""
	Set ^WK3SORT(1,"800999CAMDEN ",356252)=""
	Set ^WK3SORT(1,"801 ",307076)=""
	Set ^WK3SORT(1,"801 ",555780)=""
	Set ^WK3SORT(1,"80110 ",479476)=""
	Set ^WK3SORT(1,"802 ",306649)=""
	//Numbers
	Set ^WK3SORT(1,0,9999)=""
	Set ^WK3SORT(1,801,307077)=""
	Set ^WK3SORT(1,802,306650)=""
	Set ^WK3SORT(1,800999,311404)=""
	//Show content
	zw ^WK3SORT
	Quit
NORMALIZE(Sub)
	New (Sub)
	Set (cnt,End)=0
	Set Next=""
	Set Next=$Order(^WK3SORT(Sub,Next))
	While (Next'="")&(End=0) {
		//Check if code is pure number
		If (Next=+Next) {
			Merge ^WK3SORT(Sub,Next_" ")=^WK3SORT(Sub,Next)
			Kill ^WK3SORT(Sub,Next)
			Set cnt=$I(cnt)
		} Else {
			Set End=1
		}
		Set Next=$Order(^WK3SORT(Sub,Next))
	}
	Use 0 Write cnt_" Lines Converted",!
	//Show content
	zw ^WK3SORT
	Quit
FIND(Sub,Prefix)
	New (Sub,Prefix)
	//Setting the space char to normalize the search
	If $Extract(Prefix,$Length(Prefix))'=" " Set Prefix=Prefix_" "
	//Dealing with exact match
	If $Data(^WK3SORT(Sub,Prefix)) {
		Use 0 Write "Order: "_Sub_"-"_Prefix_" ",! 
		Do GETDETAILS(Sub,Prefix)
	}
	//Looping trough the records
	Set End=0,Cnt=0
	Set Next=$Order(^WK3SORT(Sub,Prefix))
	While (Next'="")&(End=0) {
		Set cnt=$Increment(cnt)
		//Test if current value starts with the Prefix string passed or if it is a single space char
		If ($Extract(Next,1,$Length(Prefix)-1)=$Extract(Prefix,1,$Length(Prefix-1)))!((Prefix=" ")&(Prefix=Next)) {
			Use 0 Write "Order: "_Sub_"-"_Next_" ",!
			Do GETDETAILS(Sub,Next)
		} Else {
			Set End=1
		}
		Set Next=$Order(^WK3SORT(Sub,Next))
	}
	Use 0 Write !,cnt_" lines read",!
	Quit
	//Extract detils subscripts
GETDETAILS(Sub,Ord)
	New (Sub,Ord)
	Set Det=""
	Set Det=$Order(^WK3SORT(Sub,Ord,Det))
	While Det'="" {
		Use 0 Write "  Detail: "_Det,!
		Set Det=$Order(^WK3SORT(Sub,Ord,Det))
	}
	Quit

Hi David 

You're right, there are no numeric values in the global, all values are suffixed with a space. The issue occurs when the actual value of the search string is numeric, (i.e. the prefix you're searching for has no alpha characters in, regardless of the state of the matching value in the global).

The code I had worked fine for any non numeric search strings but not with fully numeric ones. The logical solution would seem to be to add the space onto the search string to convert it to a string and then run the search, however that doesn't return the data expected. 

I think I need to take all the suggestions above and go away and play with another version of the search function, possibly combining a numeric and string search function to get the complete matching data set. 

Thanks for the suggestions. 

Thanks Timo

The codebase of this system is over 30 years old. They came up with suffixing as their solution and its used in a lot of places. As much as I can see prefixing as a better solution overall, there is no way to make that happen. So I'm stuck battling with what I've got. I've a few ideas now on where / how to approach a new search function for this class of data set, but no shining silver bullet. 

Hi. You say the codebase is over 30 years old. Well, I have a solution from 1991...

The PAS system I (still) work on has some standard search and display ("code list") software that has to return all codes "starting with", but it's not pretty. As the starting point for the $O() it does this:

	; Return a string immediately preceding input in collating sequence (returns null if input is null)
	; ABC becomes ABB||, -1 becomes -1.0000000001, -.1 becomes .1000000001, 0 becomes -.0000000001, .1 becomes .0999999999, 1 becomes 0.9999999999
SEED(A)	Q:A="" ""  ; null string
	I '$$NUMERIC(A) S LEN=$L(A),T=$E(A,1,LEN-1)_$C($A($E(A,LEN))-1)_"||" Q T
	Q A-(1E-10)

Since you know your target global only has non-numeric subscripts, you won't need to see the 5 lines of nastiness that is the NUMERIC call. :-) The end of your SEED=$O(@CLGREF@(SEED)) is either the usual "", or ZIN'=$E(SEED,1,$L(ZIN))  (again ignoring the horrible code dealing with numeric values).

Apologies for the ancient coding style (not mine, but I wrote similar back then).  / Mike