How to find multiple substrings in the string optimally?
Example: I have a list of tags that I have to find, and a string with these and other tags separated by commas.
How to find the desired tags in the string optimally?
Example: I have a list of tags that I have to find, and a string with these and other tags separated by commas.
How to find the desired tags in the string optimally?
Result
Wouldn't $data over local be more effective?
Run this code and check what's the best for you:
As a code optimisation exercise, there is a lot going on here, two methods will create two extra stack levels, lots of unnecessary variables and extraneous operations such as $l and $p.
If you created a macro such as this...
then you will get around 25x more operations per second for negative tests and upwards of 100x for short circuit positive tests, placing highest frequency tags on the left.
There are a few ways to test the existence of a sub-string in a string...
Given...
and testing for "x"
within a tight loop with a million tests
$find is marginally quicker than the contains operator with a million operations in .004609 seconds, whilst the other two methods are relatively slower. I say relative because they are still quick in their own rights.
A wider question might not be which is the fastest, but how it's implemented with multiple tag tests. For instance a single variable creation is almost as expensive as the string test itself, and if you wrapped it all in a re-usable method, the method call would be 10x as expensive.
If you are dealing with two strings in the first instance, then a tight loop on $find is going to be your most optimum option. The other functions are probably just as performent in the right context, but if you have to dissect the strings to implement them, then the cost is not the test, but in the other operations that you have to do first.
Wow. That(!) simple. Impressive. Bravo, Sean!
It's great, Sean! Thank you so much!
But on a long distance (very, very long intag string) my approach with $D will start win! (maybe :)
Actually, the case is very practical.
We need to know, which of the tags in DC relate to InterSystems Products and Services.
Every post on DC has a Tags field, which is a comma delimited string, consists of any of 150+ tags.
We need to form a Big Tag field, which is filtered Tags field with only the following values: Caché, Ensemble, HealthShare, Intersystems IRIS, DeepSee, iKnow, Atelier, Online Learning, Documentation, WRC.
E.g. this particular post has Tags: Beginner, Caché
Big Tags will be: Caché
My variant is the following. General function of extracting subtag from tagsring which contains certain tags (intag):
ClassMethod SubTag(intag, tag, dlm As %String) As %String { for i=1:1:$L(intag,dlm) set intag($p(intag,dlm,i))="" for i=1:1:$L(tag,dlm) { set t=$p(tag,dlm,i) if $D(intag(t)) set subtag($Seq(subtag))=t } while $d(subtag($Seq(l))) { set $p(subtag,dlm,l) = subtag(l) } return $G(subtag) }
And the calling function:
ClassMethod BigTag(tag As %String) As %String { set intag="Caché,Ensemble,HealthShare,Intersystems IRIS,DeepSee,iKnow,Atelier,Online Learning,Documentation,WRC" return ..SubTag(intag,tag,",") }
Usage: