%STARTSWITH is not faster or slower when comparing apples to apples.
LIKE can find a substring wherever it occurs, and has multi-character and single-character wildcards. %STARTSWITH is looking only at the beginning of the string, so it's equivalent to LIKE 'ABC%'.
Updating to match another updated post lower on this page. If the comparison string is parameterized, LIKE sometimes does an extra check, so %STARTSWITH will be slightly faster.
When the comparison string ('ABC%' and 'ABC') is fixed. The code that checks LIKE 'ABC%' is exactly the same as the code that checks %STARTSWITH 'ABC'
If possible it's always recommended to use the Caché implemented predicates rather than the ANSI sql ones as they will normally always be faster. Execute the same queries in the management portal on large tables and you can verify those for yourself.
Maybe %STARTSWITH 'abc' was at one time faster than the equivalent predicate LIKE 'abc%'.
The quote comes from the FOR SOME %ELEMENT predicate documentation. This predicate can be used with Collections and an old feature called Free Text Search. The quote was actually only meant to apply to the Free Text Search usage.
I've tested %STARTSWITH 'abc' and LIKE 'abc%' today using FOR SOME %ELEMENT with Collections and Free Text Search. The code is identical.
Conclusion: the quote will be removed from the documentation since it's no longer true.
Thanks, @Vitaliy.Serdtsev, for making me realize that I should have been testing with placeholders rather than fixed values to the right of %STARTSWITH or LIKE. I was testing with Embedded SQL; with fixed values, my earlier statements are true. But if the query itself uses placeholders (? or host variables), or the WHERE clause is parameterized automatically (thanks, @Eduard Lebedyuk, for mentioning that) then the generated code differs, and LIKE sometimes does do an extra (slightly slower) comparison, because at runtime, LIKE could get a simple pattern ("abc%") or a complex one ("a_b%g_i") and the code has to cope with those possibilities.
New conclusion: the quote will be clarified so that it mentions placeholders/paramaterization and moved to the %STARTSWITH and LIKE documentation, instead of being buried in FOR SOME %ELEMENT.
Conclusion: the quote will be removed from the documentation since it's no longer true.
Then besides this, in the documentation for %STARTSWITH need to add the note DEPRECATED and the recommendation "use LIKE 'XXX%'"
I also did an analysis for Caché 2018.1
Class del.t Extends %Persistent
{
Index ip On p;
Property p As %VarString;
/// d ##class(del.t).Fill()
ClassMethod Fill(N = 1000000)
{
d DISABLE^%NOJRN
k ^del.tD,^del.tI
f i=1:1:N s ^del.tD(i)=$lb("","test"_i)
s ^del.tD=N
d ENABLE^%NOJRN
d ..%BuildIndices(,,,$$$NO)
d $system.SQL.TuneTable($classname(),$$$YES)
d $system.OBJ.Compile($classname(),"cu-d")
}
}
Although the plans are exactly the same in SMP the results of the metrics differ:
select count(*) from del.t where p like 'test7%'
Row count: 1 Performance: 0.291 seconds 333340 global references 2000537 lines executed
select count(*) from del.t where p %startswith 'test7'
Row count: 1 Performance: 0.215 seconds 333340 global references 1889349 lines executed
For the next two queries, the INT code matches:
&sql(select * from del.t where p like 'test7%')
&sql(select * from del.t where p %startswith 'test7')
But for these - is already different, so the metrics in SMP are different:
&sql(select * from del.t where p like :a)
&sql(select * from del.t where p %startswith :a)
In IRIS 2020.1, the work with embedded queries was changed, but I can't check it.
Forget.
Now %STARTSWITH all other things being equal is slightly faster than LIKE. This point is deeply hidden in the documentation, and it seems that this applies only to FOR SOME %ELEMENT.
If manage to speed up the special case for LIKE, then still need to correct/supplement the documentation.
I have never heard of anyone issuing the blanket statement that InterSystems predicates are faster or slower than the ANSI standard ones. I don't think there are that many predicates that have similar functionality. As I said in a different comment, %STARTWITH 'abc' is 100% equivalent to LIKE 'abc%'. InterSystems also provides %MATCHES and %PATTERN, but they are different.
Than you all for your replies. I heard the %STARTSWITH has better performance but I never know it is in the online document. However, I am a little confused: instead of recommending users to use something with which they are not familiar, why not make 'LIKE' faster?
I'm surprised you don't see the obvious performance difference of looking for something of distinct length at the beginning of a string vs. scanning an eventual rather long string for some bytes somewhere eventually including also composed strings as %AB%CD%.
keep in mind: Caché is built for speed, not for the comfort of the average programmer
> keep in mind: Caché is built for speed, not for the comfort of the average programmer
We work hard to make IRIS not only fast but comfortable too for any backend, full-stack, AI developer. Pinging @Raj Singh , our Product Manager on Developer Experience.
Today's compiler's optimizers detect the most of common expressions and generate the better performatic code, but we can't be obsessed with readability.
%STARTSWITH relates better to Caché internal structures and is faster in larger scale
%STARTSWITH is not faster or slower when comparing apples to apples.
LIKE can find a substring wherever it occurs, and has multi-character and single-character wildcards. %STARTSWITH is looking only at the beginning of the string, so it's equivalent to LIKE 'ABC%'.
Updating to match another updated post lower on this page. If the comparison string is parameterized, LIKE sometimes does an extra check, so %STARTSWITH will be slightly faster.
When the comparison string ('ABC%' and 'ABC') is fixed. The code that checks LIKE 'ABC%' is exactly the same as the code that checks %STARTSWITH 'ABC'
Well, the result is not always the same: I just found out that you should be careful when working with :variables, for example in %SQLQuery:
Returns 1 row for myvar being null.
Whereas
Does return no row for myvar being null.
Using IRIS for Windows (x86-64) 2021.1 (Build 215) Wed Jun 9 2021 09:56:33 EDT
AH ok
If possible it's always recommended to use the Caché implemented predicates rather than the ANSI sql ones as they will normally always be faster. Execute the same queries in the management portal on large tables and you can verify those for yourself.
The list of those are here: Caché https://cedocs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_PREDICATE_CONDITONS
I was told that before but I never know it written in document. Thanks.
I did a little bit more research.
Maybe %STARTSWITH 'abc' was at one time faster than the equivalent predicate LIKE 'abc%'.The quote was actually only meant to apply to the Free Text Search usage.Conclusion: the quote will be removed from the documentation since it's no longer true.Thanks, @Vitaliy.Serdtsev, for making me realize that I should have been testing with placeholders rather than fixed values to the right of %STARTSWITH or LIKE. I was testing with Embedded SQL; with fixed values, my earlier statements are true. But if the query itself uses placeholders (? or host variables), or the WHERE clause is parameterized automatically (thanks, @Eduard Lebedyuk, for mentioning that) then the generated code differs, and LIKE sometimes does do an extra (slightly slower) comparison, because at runtime, LIKE could get a simple pattern ("abc%") or a complex one ("a_b%g_i") and the code has to cope with those possibilities.
New conclusion: the quote will be clarified so that it mentions placeholders/paramaterization and moved to the %STARTSWITH and LIKE documentation, instead of being buried in FOR SOME %ELEMENT.
And thanks to @Hao Ma for bringing this up!
Then besides this, in the documentation for %STARTSWITH need to add the note DEPRECATED and the recommendation "use LIKE 'XXX%'"
I also did an analysis for Caché 2018.1
Although the plans are exactly the same in SMP the results of the metrics differ:
select count(*) from del.t where p like 'test7%'
Row count: 1 Performance: 0.291 seconds 333340 global references 2000537 lines executed
select count(*) from del.t where p %startswith 'test7'
Row count: 1 Performance: 0.215 seconds 333340 global references 1889349 lines executed
For the next two queries, the INT code matches:
But for these - is already different, so the metrics in SMP are different:
In IRIS 2020.1, the work with embedded queries was changed, but I can't check it.
Hi Vitaly!
Why can't you check it? IRIS 2020.1 Docker version is available for everyone with a community edition.
Also, the Cloud version of Try IRIS is available too, but it is 2019.3 though.
Hi Evgeny!
I can't check for technical reasons. Docker version does not suit me.
Understand. Just curious - does Try IRIS work for you for testing purposes? Or are there any issues with Try IRIS? I mean we could fix it if any.
Yes, Try IRIS (which is 2019.3) works without problems.
I'm not sure what you mean here. %STARTSWITH executed fewer lines so why would we recommend LIKE instead?
Forget.
Now %STARTSWITH all other things being equal is slightly faster than LIKE. This point is deeply hidden in the documentation, and it seems that this applies only to FOR SOME %ELEMENT.
If manage to speed up the special case for LIKE, then still need to correct/supplement the documentation.
I have never heard of anyone issuing the blanket statement that InterSystems predicates are faster or slower than the ANSI standard ones. I don't think there are that many predicates that have similar functionality. As I said in a different comment, %STARTWITH 'abc' is 100% equivalent to LIKE 'abc%'. InterSystems also provides %MATCHES and %PATTERN, but they are different.
Than you all for your replies. I heard the %STARTSWITH has better performance but I never know it is in the online document. However, I am a little confused: instead of recommending users to use something with which they are not familiar, why not make 'LIKE' faster?
I'm surprised you don't see the obvious performance difference of looking for something of distinct length at the beginning of a string
vs. scanning an eventual rather long string for some bytes somewhere eventually including also composed strings as %AB%CD%.
keep in mind: Caché is built for speed, not for the comfort of the average programmer
I think that the author meant that the simplest queries of the form
like 'text%'
automatically worked as/converted to
%startswith 'text'
To answer that, before query is compiled all arguments are parametrized:
becomes
so we can't really replace LIKE with %STARTSWITH on code generation step (there's a brackets argument specification I suppose).
And on code execution step?
Thanks, Robert!
> keep in mind: Caché is built for speed, not for the comfort of the average programmer
We work hard to make IRIS not only fast but comfortable too for any backend, full-stack, AI developer. Pinging @Raj Singh , our Product Manager on Developer Experience.
I think the good sense is the key,
Today's compiler's optimizers detect the most of common expressions and generate the better performatic code, but we can't be obsessed with readability.
A good post: Performance vs Readability
Thanks Hao Ma, this sounds reasonable. Inviting @Benjamin De Boe , @Raj Singh to comment on this.
Could %CONTAINS be included in this discussion?
Sure, Andre!
What's wrong with %CONTAINS?
I think it deserves a separate question/discussion
Quote from the documentation:
Social networks
InterSystems resources
Log in or sign up
Log in or create a new account to continue
Log in or sign up
Log in or create a new account to continue
Log in or sign up
Log in or create a new account to continue