Pattern matching with '*' and '?'

Primary tabs

I have an in-memory list of items and I want to check which items match my pattern string.

Pattern string is a comma-separated list of items and special symbols like '*' and maybe '?'.

There's something similar in $system.OBJ.Compile, it accepts patterns: "*.data.*,Sample.*" -  and it would compile 'Sample' package and all 'data' packages.

For example:

set list=$lb("abc", "c", "aaa", "bbb")
set result = ..match(list, "a*,*b")
zw result
result=$lb("abc","aaa","bbb")
  • 0
  • 0
  • 134
  • 4
  • 3

Answers

Just some hints. I wrote a few helper functions for similar purposes.

MaskToPattern(mask,sw) ; mask with * and ? -> M pattern
 ; $$MaskToPattern^Idea.System.ZIF("ABC?D*") -> 1"ABC"1E1"D".E
 ; sw .. used for contains test (.E on both ends) : $$MaskToPattern^Idea.System.ZIF("ABC?D",1) -> .E1"ABC"1E1"D".E

and

MATCH(input,mask) ; matches input string against search mask

-> $$MATCH^Idea.System.ZIF("CSVZ","CST*")=0, $$MATCH^Idea.System.ZIF("CSVZ","CSV*")=1
MATCHOR(input,masklist) ; matches input string against list of search masks with OR condition (at least one)

-> $$MATCH^Idea.System.ZIF("CSVZ","CSV*,PF*")=1
MATCHAND(input,masklist) ; matches input string against list of search masks with AND condition (all must fit)

-> $$MATCH^Idea.System.ZIF("CSVZ","CSV*,PF*")=0


My implementation :

MaskToPattern(mask,sw) pattern,char,pos pattern="",char="" pos=1:1:$L(mask) D
 . $E(mask,pos)="*" pattern=pattern_$S(char="":"",1:"1"""_char_"""")_".E",char="" Q
 . $E(mask,pos)="?" pattern=pattern_$S(char="":"",1:"1"""_char_"""")_"1E",char="" Q
 . char=char_$E(mask,pos)
 pattern=pattern_$S(char="":"",1:"1"""_char_""""),char="" S:$G(sw) pattern=".E"_pattern_".E"
 pattern

MATCH(input,mask) input?@$$MaskToPattern^Idea.System.ZIF(mask)
MATCHOR(input,list) ok,pie,mask ok=0 pie=1:1:$L(list,",") mask=$P(list,",",pie) mask'="",$$MATCH^Idea.System.ZIF(input,mask) ok=1 Q
 ok
MATCHAND(input,list) ok,pie,mask ok=1 pie=1:1:$L(list,",") mask=$P(list,",",pie) mask'="",'$$MATCH^Idea.System.ZIF(input,mask) ok=0 Q
 ok

If someone is interested, here's the same code but as a class:

/// Utility to match input against comma-separated string of masks.
Class util.Matcher
{

/// Returns $$$YES if input matches at least one element from the list
/// input - string to match
/// list - comma separated list of masks containig * and ?
/// write ##class(util.Matcher).MatchOr()
ClassMethod MatchOr(input As %String, list As %String) As %Boolean
{
    set ok = $$$NO
    for pie=1:1:$L(list,",") {
        set mask=$P(list,",",pie)
        if mask'="",..Match(input,mask) {
            set ok = $$$YES
        }
    }
    quit ok
}

/// Returns $$$YES if input matches all elements from the list
/// input - string to match
/// list - comma separated list of masks containig * and ?
/// write ##class(util.Matcher).MatchAnd()
ClassMethod MatchAnd(input As %String, list As %String) As %Boolean
{
    set ok = $$$YES
    for pie=1:1:$L(list,",") {
        set mask=$P(list,",",pie)
        if mask'="",'..Match(input,mask) {
            set ok = $$$NO
        }
    }
     quit ok
}

/// Returns $$$YES if input matches the mask
/// write ##class(util.Matcher).Match()
ClassMethod Match(input As %String, mask As %String) As %Boolean [ CodeMode = expression ]
{
input?@..MaskToPattern(mask)
}

/// Translate mask into a pattern
/// write ##class(util.Matcher).MaskToPattern()
ClassMethod MaskToPattern(mask As %String) As %String
{
    set pattern = ""
    set char = ""
    for pos = 1:1:$length(mask) {
        set curChar = $extract(mask, pos)
        if curChar = "*" {
            set pattern = pattern _ $select(char="":"", 1:"1"""_char_"""") _ ".E"
            set char = ""
        } elseif curChar = "?" {
            set pattern = pattern _ $select(char="":"", 1:"1"""_char_"""") _ "1E"
            set char = ""
        } else {
            set char = char _ curChar
        }
    }
    set pattern = pattern _ $select(char="":"", 1:"1"""_char_"""")
    quit pattern
}

}

I'm not sure if this is the question but I think

"*.data.*"  could be covered by ?.E1".data.".E        eventually instead of .E somewhat limited by .AN  or .ANP

"Sample.*"  could become ?1"Sample.".E     Same for  .ANP  as above    

The input pattern is not constant, it's just an example.

I'd rather avoid parsing my input pattern and translating it into Cache pattern.

As you know for sure, pattern can be assign to variable:

 set list=$lb("abc", "c", "aaa", "bbb")
 set result = $$patmatch(list, "1(1(1.""a"".E),1(.E1.""b""))")
 ;set result = ..match(list, "a*,*b")
 zw result // result=$lb("abc","aaa","bbb")
 q

patmatch(pList, pPat)
 new result,set result=""
 for i=1:1:$ll(pList) if $lg(pList,i)?@pPat set result=result_$lb($lg(pList,i))
 quit result

Comments

It sounds like you want to do pattern matching, but for some reason you don't want to use ObjectScript pattern match syntax for your mask. So you've decided that you want to use * for "any number of any character" and ? for "a single character" which are pretty common.

Remember that ObjectScript also has the $match Regex function. So your match() method could use that syntax for the search mask and you wouldn't have to write any code to change * into .E and ? into 1E and you'd get the benefit of additional options offered by $match.