Discussion (26)5
Log in or sign up to continue

%STARTSWITH is not faster or slower when comparing apples to apples.

LIKE can find a substring wherever it occurs, and has multi-character and single-character wildcards. %STARTSWITH is looking only at the beginning of the string, so it's equivalent to LIKE 'ABC%'.

Updating to match another updated post lower on this page. If the comparison string is parameterized, LIKE sometimes does an extra check, so %STARTSWITH will be slightly faster.

When the comparison string ('ABC%' and 'ABC') is fixed. The code that checks LIKE 'ABC%' is exactly the same as the code that checks %STARTSWITH 'ABC'

Well, the result is not always the same: I just found out that you should be careful when working with :variables, for example in %SQLQuery:

select 1 where 'well...' %startswith :myvar

Returns 1 row for myvar being null.

Whereas

select 1 where 'well...' like :myvar||'%';

Does return no row for myvar being null.

Using IRIS for Windows (x86-64) 2021.1 (Build 215) Wed Jun 9 2021 09:56:33 EDT

If possible it's always recommended to use the  Caché  implemented predicates rather than the ANSI sql ones as they will normally always be faster. Execute the same queries in the management portal on large tables and you can verify those for yourself.

The list of those are here:  Caché  https://cedocs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_PREDICATE_CONDITONS
 

I did a little bit more research.

  • Maybe %STARTSWITH 'abc' was at one time faster than the equivalent predicate LIKE 'abc%'.
  • The quote comes from the FOR SOME %ELEMENT predicate documentation. This predicate can be used with Collections and an old feature called Free Text Search. The quote was actually only meant to apply to the Free Text Search usage.
  • I've tested %STARTSWITH 'abc' and LIKE 'abc%' today using FOR SOME %ELEMENT with Collections and Free Text Search. The code is identical.

Conclusion: the quote will be removed from the documentation since it's no longer true.

Thanks, @Vitaliy.Serdtsev, for making me realize that I should have been testing with placeholders rather than fixed values to the right of %STARTSWITH or LIKE. I was testing with Embedded SQL; with fixed values, my earlier statements are true. But if the query itself uses placeholders (? or host variables), or the WHERE clause is parameterized automatically (thanks, @Eduard Lebedyuk, for mentioning that) then the generated code differs, and LIKE sometimes does do an extra (slightly slower) comparison, because at runtime, LIKE could get a simple pattern ("abc%") or a complex one ("a_b%g_i") and the code has to cope with those possibilities.

New conclusion: the quote will be clarified so that it mentions placeholders/paramaterization and moved to the %STARTSWITH and LIKE documentation, instead of being buried in FOR SOME %ELEMENT.

And thanks to @Hao Ma for bringing this up!

Conclusion: the quote will be removed from the documentation since it's no longer true.

Then besides this, in the documentation for %STARTSWITH need to add the note DEPRECATED and the recommendation "use LIKE 'XXX%'"

I also did an analysis for Caché 2018.1

Class del.t Extends %Persistent
{

Index ip On p;

Property As %VarString;

/// d ##class(del.t).Fill()
ClassMethod Fill(1000000)
{
  DISABLE^%NOJRN
  ^del.tD,^del.tI

  i=1:1:^del.tD(i)=$lb("","test"_i)
  ^del.tD=N
  ENABLE^%NOJRN

  ..%BuildIndices(,,,$$$NO)
  d $system.SQL.TuneTable($classname(),$$$YES)
  d $system.OBJ.Compile($classname(),"cu-d")
}
}

Although the plans are exactly the same in SMP the results of the metrics differ:

select count(*from del.where like 'test7%'
Row count: 1 Performance: 0.291 seconds 333340 global references 2000537 lines executed

select count(*from del.where %startswith 'test7'
Row count: 1 Performance: 0.215 seconds 333340 global references 1889349 lines executed

For the next two queries, the INT code matches:

&sql(select * from del.where like 'test7%')
&sql(select * from del.where %startswith 'test7')

But for these - is already different, so the metrics in SMP are different:

&sql(select * from del.where like :a)
&sql(select * from del.where %startswith :a)

In IRIS 2020.1, the work with embedded queries was changed, but I can't check it.

Then besides this, in the documentation for %STARTSWITH need to add the note DEPRECATED and the recommendation "use LIKE 'XXX%'"

select count(*from del.where like 'test7%'
Row count: 1 Performance: 0.291 seconds 333340 global references 2000537 lines executed

select count(*from del.where %startswith 'test7'
Row count: 1 Performance: 0.215 seconds 333340 global references 1889349 lines executed

I'm not sure what you mean here. %STARTSWITH executed fewer lines so why would we recommend LIKE instead?

I have never heard of anyone issuing the blanket statement that InterSystems predicates are faster or slower than the ANSI standard ones. I don't think there are that many predicates that have similar functionality. As I said in a different comment, %STARTWITH 'abc' is 100% equivalent to LIKE 'abc%'. InterSystems also provides %MATCHES and %PATTERN, but they are different.

I'm surprised you don't see the obvious performance difference of looking for something of distinct length at the beginning of a string
vs. scanning an eventual rather long string for some bytes somewhere eventually including also composed strings as %AB%CD%.

keep in mind: Caché is built for speed, not for the comfort of the average programmer